2013-04-27

Node over Express - Autoload

Preface

This is the 2nd post of the Node over Express series (the previous one is Configuration). In this post, I’d like to discuss a famous pain point in Node.js.

Pain Point

There is well known Lisp joke:

A top hacker successfully stole the last 100 lines of a top secret program from the Pentagon. Because the program is written in Lisp, so the stolen code is just close brackets.

It is a joke that there are too many brackets in Lisp. In Node.js there is a similar issue that there are too many require. Open any node.js file, usually one could find several lines of require.

Due to the node’s sandbox model, the developer has to require resources time and time again in every files. It is not so exciting to write or read lines of meaningless require. And the worst, it could be a nightmare once a developer wishes to replace some library with another.

Rails Approaches

“Require hell” isn’t only for node.js, but also for Ruby apps. Rails has solved it gracefully, and the developer barely needs to require anything manually in Rails.

There are 2 kinds of dependencies in rails app, one is the external resource, another is the internal resource.

External Resources

External resources are classes encapsulated in ruby gems. In ruby application, developer describe the dependencies in Gemfile, and load them with Bundler. Some frameworks have already integrated with Bundler, such as Rails. When using them, developer doesn’t need to do anything manually, all the dependencies are required automatically. For others, use bundle execute to create the ruby runtime with all gems required.

Internal Resources

Internal Resources are the classes declared in the app, they could be models, the services or the controllers. Rails uses Railtie to require them automatically. The resource is loaded the first time it is used, the requiring process is “Lazy”. (In fact, this description isn’t that precise because Rails behaves differently in production environment. It loads all the classes during the launching for performance reason).

Autoload in Node.js

Rails avoids the “require-hell” with two “autoload” mechanisms. Although there are still debates about whether autoload is good or not. But at least, autoload frees the developer from the dull dependency management and increases the productivity of developers. Developers love autoload in most cases.

So to avoid “require-hell” in Node.js, I prefers autoload mechanism. But because there are significant differences in type system between Node.js and Ruby, we cannot copy the mechanism from ruby to node as is. Therefore before diving into the solution, we need to understand the differences first.

Node.js Module System

There are a number of similarities between Node.js and ruby; things in node.js usually have the equivalences in ruby. For example, package in node is similar to the gem in Ruby, npm equals to Gem and Bundler, package.json takes the responsibility of Gemfile and Gemfile.lock. The similarity enables porting autoload from ruby to node.

In some aspect, there are similarities between Node.js and Ruby, but there are also significant differences between them in some other aspects. One of the major differences is the type system and module sandbox in Node.js, which works in a quite different way to Ruby type system.

JavaScript isn’t a real OO language, so it doesn’t have real type system. All the types in JavaScript are actually functions, which are stored in local variables instead of in type system. Node.js loads files into different sandboxes, all the local variables are isolated between files to avoid “global leak”, a well-known deep-seated bad part of JavaScript. As a result, a Node.js developer needs to require used types again and again in every file.

In ruby, it is a lot better. With the help of the well designed type system, types are shared all over the runtime, a developer just needs to require the types not yet loaded.

So in node.js programs, there are many more require statements than in ruby. And due to the design of node.js and javascript, the issue is harder to be resolved.

Global Variable

In the browser, the JavaScript runtime other than node, global variables are very common. Global variable could be abused easily, which brings global leak to bad written JavaScript programs, and drives thousands of developers up to the wall. The JavaScript developers are scared of global leak so much so that they designed such a strict isolation model in node.js. But to my understanding, the isolation avoided global leaks effectively. But at the same time, it brought tens of require statements to every files, which is also not acceptable.

In fact, with the help of JSLint, CoffeScript and some other tools, developers can avoid global leak easily. And global sharing isn’t the source of evil. If abuse is avoided, I believes a reasonable level of global sharing could be useful and helpful. Actually Node.js have built-in a global sharing mechanism.

To share values across file, a special variable global is needed, which could be accessed in every file, and the value of which is also shared across files.

Besides sharing value around, global has another important feature: node treats global as default context, whose child you can refer to without explicitly identifying. So SomeType === global.SomeType.

With the help of global, we find a way to share types across files naturally.

JS Property

Rails’ autoload mechanism loads the classes lazily. It only loads the class when it is used for first time. It is a neat feature, and Rails achieve it by tracking the exception of “Uninitialized Constant”. To implement similar feature in Node.js, tracking exception is hardly feasible, so I choose a different approach, I use Property.

Property (Attribute in Ruby) enables method (function) being invoked as the field of an object is accessed. Property is a common feature among OO languages, but is a “new” feature to JavaScript. Property is a feature declared in ECMAScript 5 standard, which enables the developers to declare property on object by using the API Object.defineProperty. With the property, we’re able to hook the callback on the type variables, and require the types when the type is accessed. So the module won’t be required until it is used. On the other hand, node.js require function has built in the cache mechanism; it won’t load the file twice, instead it return the value from its cache.

With property, we make the autoload lazy!

My Implementation

To make autoload work, we need to create a magic host object to hold the type variables. In my implementation, I call the magic object Autoloader
we need to require a bootstrap script when the app starts, which is used to describe which types and how they should be required.

Bootstrap Script: initEnvironment.coffee

global.createAutoLoader = require('./services/AutoLoader')
global.createPathHelper = require('./services/PathHelper')
global.rootPath = createPathHelper(__dirname, true)
global.Configuration = require(rootPath.config('configuration'))
global.Services = createAutoLoader rootPath.services()
global.Routes = createAutoLoader rootPath.routes()
global.Records = createAutoLoader rootPath.records()
global.Models = createAutoLoader rootPath.models()
global.assets = {} # initialize this context for connect-assets helpers

The script sets-up the autoload hosts for all services, routes, records, models for my app. And we can reference the types as following:

Sample Usage

Records.User.findById uid, (err, user) ->
  badge = new Models.Badget(badgeInfo)
  user.addBadge badge
  user.save()

In the initEnvironment.coffee script, there are 2 very important classes that are used:

AutoLoader: The class that works as the type variable hosts. All the magic happens here.
PathHelper: The class used to handle the path combination issue.

The detailed implementation is here:

Autoload

_ = require('lodash')
path = require('path')
fs = require('fs')
createPathHelper = require('./PathHelper')
createLoaderMethod = (host, name, fullName) ->
  host.__names.push name
  Object.defineProperty host, name,
                        get: ->
                          require(fullName)
class AutoLoader
  constructor: (source) ->
    @__names = []
    for name, fullName of source
      extName = path.extname fullName
      createLoaderMethod(this, name, fullName) if require.extensions[extName]? or extName == ''
expandPath = (rootPath) ->
  createPathHelper(rootPath).toPathObject()
buildSource = (items) ->
  result = {}
  for item in items
    extName = path.extname(item)
    name = path.basename(item, extName)
    result[name] = item
  result
createAutoLoader = (option) ->
  pathObj = switch typeof(option)
    when 'string'
      expandPath(option)
    when 'object'
      if option instanceof Array
        buildSource(option)
      else
        option
  new AutoLoader(pathObj)
createAutoLoader.AutoLoader = AutoLoader
exports = module.exports = createAutoLoader

PathHelper

_ = require('lodash')
fs = require('fs')
path = require('path')
createPathHelper = (rootPath, isConsolidated) ->
  rootPath = path.normalize rootPath
  result = (args...) ->
    return rootPath if args.length == 0
    parts = _.flatten [rootPath, args]
    path.join.apply(this, parts)
  result.toPathObject = ->
    self = result()
    files = fs.readdirSync(self)
    pathObj = {}
    for file in files
      fullName = path.join(self, file)
      extName =  path.extname(file)
      name = path.basename(file, extName)
      pathObj[name] = fullName
    pathObj
  result.consolidate = ->
    pathObj = result.toPathObject()
    for name, fullName of pathObj
      stats = fs.statSync(fullName)
      result[name] = createPathHelper(fullName) if stats.isDirectory()
    result
  if isConsolidated
    result.consolidate()
  else
    result
exports = module.exports = createPathHelper

The code above are part of the Express over Node, to access the complete codebase, please check out the repo on github.

Besides of the content, I want to say thank you to my English teacher Marina Sarg, who helped me on this series of blog a lot. Without her, there won’t be this series of blogs. Marina, thank you very much.

2013-04-25

Programming►node.js

Programming

Node over Express - Configuration

Preface

I have been working on Node.js related projects for quite a while, and have built apps with node both for the clients or personal projects, such as LiveHall, CiMonitor, etc. I have promised some one to share my experience on node. Today I’ll begin to work on this. This will be the first blog of the series.

Background

In this blog, I would like to talk about the configuration in node, which is common problem we need to solve in apps.

Problems related to configuration aren’t new, and there have been a dozens of mature solutions, but for Node.js apps, there is still something worth to be discussed.

Perhaps configuration could be treated as a kind of special data. Usually developers prefer to use data language to describe their configurations. Here are some examples:

.net and Java developer usually uses Xml to describe their configuration
Ruby developer prefers Yaml as the configuration language
JavaScript developer tend to use Json

Data languages are convenient, because developers can easily build DSL on it, then they describe the configuration with the DSL. But is the data language the best option available? Is it really suitable to be used in all scnearios?

Before we answer the questions, I would like to say something about the problem we’re facing. There is one common requirement to all kinds of configuration solutions, which is default values and overriding.

For example, as a Web app default, we use port 80; but in development environment, we prefer to use a port number over 1024, 3000 is a popular choice. That means we need to provide 80 as the default value of the port, but we wish to override the value with 3000 in the development environment.

For the languages I mentioned above, except for Yaml, Xml and Json, doesn’t provide native support of inheritance and overriding. It means we need to implement the mechanism by our own. Take Json as example, we might write the configuration in this way:

Sample Json configuration

{
  "default": {
    "port": 80,
    "serveAssets": true
  },
  "development": {
    "port": 3000,
    "database": "mongodb://localhost/development"
  },
  "test": {
    "database": "mongodb://localhost/test"
  },
  "production": {
    "serveAssets": false,
    "database": "mongodb://ds0123456.mongolab.com:43487/my_sample_app"
  }
}

The previous Json snippet is a typical example of web app configuration; it has a default section to provide the default values for all environments. Three sections for specific environments. To apply it corecctly to our app, we need to load and parse the Json file to get all data first, then load the values of the default section, then override the value with the values from specific environment. In addition, we might wish to have the validation that yields error when the provided environment doesn’t exist.

This solution looks simple and seems to work, but when you try to apply this approach to your app in real life, you need to watch out some pitfalls.

Issue 1: Confidential Values

In the real world, values in configuration sometimes could be sensitive and need to be kept confidential. It could contain the credential to access your database, Or it could contain the key to decrypt the cookies. It may also contain private certificate that identifies and authenticates the app to other services. In these scenarios, you need to protect your configuration in order to avoid big trouble!

To solve the issue, you might think about adding new feature that enable you to to encrypt confidential values or to load it from a different safe source. To achieve it, you might need to add another layer of DSL which add more complexities to your app and make your code harder to debug or to maintain.

Issue 2: Dynamic Data

A solution to first issue, one could store the environment related but sensitive data in the environment variables. The solution is simple and works perfectly, so I highly recommend it. However, to do this means you need the capability to load the value not only from Json directly but also from the environment variables.

Sometimes, such as deploying your app to Heroku/Nojitsu, might give rise that make the case trickier. After deployed the app to Heroku/Nojitsu, the default values are provided in Json directly, and some of which need to be overrode with the values from environment variables or you need to do it vice versa. These tricky requirements might blow your mind and your code away easily. It causes complicated DSL design and hundreds lines of implementation, but just to load your configuration properly. Obviously it is not a good idea.

Issue 3: Complicated Inheritance Relationship

Scared about above cases? No, then how about complicated inheritance relationship between environments?

In some big and complicated web apps, there might be more than 3 basic environments, such as:

Development: for developers to develop the app locally
Test: for developers to run unit or function test locally, such as mocha tests
Regression: for developers or QAs to run regression tests, such as cucumber tests
Integration: for QAs or Ops to test the integration with other apps
Staging: for ops and QAs to test the app in production like environment before it really goes live
Production: the environment serves your real users
…

When try to write configurations for these environments, one might find there are only a few differences between environments. To make life easier, to avoid the redundancy, introducing the inheritance between configurations might be a good idea.

As the consequence, the whole configuration becomes environments with complex inheritance relationship. And to support this kind of configuration inheritance, a more complex DSL and hundreds lines of codes are needed.

Some Comments

My assumption above seems to be a little too complex. From some people, it might be the “WORST CASE SCENERIO” and hard to come by. But according to my experience, it is very common when building real web app with node. So if to solve it isn’t too hard, it could be better to consider it seriously and solve it gracefully.

Ruby developer might think they’re lucky because Yaml supports inheritance natively. But confidential data and dynamic data still troubles.

My Solution

After learnt a number of painful lessons, I figured out a simple but working solution: Configuration as Code - describe the configuration with the same language that the business logic is described!

Configuration as code isn’t a new concept, but it is extremely handy when you use it in node applications! Let me explain why and how it works:

To protect the confidential configuration values, one should store them with environment variables, which are only accessible in the specific server.
Then one can load these values from the environment variables as dynamically values.

To do it in a data language such as Xml, Json or Yaml could be hard, but it will become as easy as taking a candy from a baby if it is done in the programming language that application applied/used, such as ruby or javascript.

To the configuration inheritance, OO languages have already provided very handy inheritance mechanism. Why do we need to invent one? Why not just use it? To the value overriding, OO programming tells us that it is called polymorphism. The only difference here from the typical scenario is that we override the values instead of the behaviors. But it isn’t an issue, because the value could be the result of the behavior, right?

Now I assume that everyone got a pretty good idea of what I am saying. If that is the case, then the below code should be able to be understood quite clearly, which is a standard Node.js file written in coffee script:

Configuration as Code Example

process.env.NODE_ENV = process.env.NODE_ENV?.toLowerCase() ? 'development'
class Config
  port: 80
  cookieSecret: '!J@IOH$!BFBEI#KLjfelajf792fjdksi23989HKHD&&#^@'
class Config.development extends Config
  port: 3009
  redis:
    uri: 'redis://localhost:6379'
  mongo:
    uri: 'mongodb://localhost'
class Config.test extends Config.development
class Config.heroku extends Config
  cookieSecret: process.env.COOKIE_SECRET
  redis:
    uri: process.env.REDISCLOUD_URL
  mongo:
    uri: process.env.MONGOLAB_URI
module.exports = new Config[process.env.NODE_ENV]()

See, with the approach, one can describe the configuration easily and clearly in a few lines of code, but with built-in loading dynamical values capability and configuration inheritance and overriding capability.

In fact, with my suggestions, it might work better than expected! Here are the additional free benefits:

Only one configuration is needed when the app deployed to the cloud. Because all the host specific configurations are usually provided via the environment variables in Paas.
Have some simple and straightforward logic in the configuration, which could be very useful, especially if there is some naming convention in the configuration. But complicated or tricky logic should be strictly avoided, because it is hurts the readability and maintainability.
Easy to write tests for configurations, to ensure the values are properly set. It could be very handy when there are complicated inheritance relationships between configurations, or have some simple logic in your configuration.
Avoid to instantiate and execute the code that isn’t related to the current environment, which could be helpful to avoid overhead to instantiate unused expensive resources or to avoid errors caused because of incompatibility between environments.
Get runtime error when the configuration for the environment doesn’t exist.

2013-04-18

Practice►git

Practice

Mac OS X case-insensitive file system pitfall

I was working on the YouTube video playback feature for LiveHall last night, and have it works successfully on my local devbox, which is running Mac OS X. Then I deployed the code to Heroku, without any regression.

But today morning, when I have the demonstrate the new features, I met server error! It says 1 of the 4 javascripts are missing, so the Jade template failed to render.

This is a very wield issue, then I try the same data on my local dev box once again, and it works perfect! But it does yield error on the production! Then I tried to use heroku toolbelt to run ls command on the production, and I found the there are 4 coffee scripts there.
Then I tried to enforce heroku to redeploy the app by using git push --force, but the issue remains!
Then I even tried to dive into the dependency pacakges connect-assets and snockets, but still found nothing useful.

Same code, same data, but different result! Very odd issue!

After half an hour fighting against the code, I suddenly noticed I the file name is RevealJSPresenter.coffee, whose “S” is capital S! And I reference the file with name #= require ./presenter/RevealJsPresenter, whose ‘s’ is a lowercase ‘s’!

And snockets depends on the OS feature to locate the file. On my local dev environment, I’m using Mac. Although OS X allow user to explicitly format the HFS+ into file name case sensitive mode, but it is case insensitive by default. So snockets can locate the file even the case is wrong.
But once I have deployed to heroku, which, I assume, runs Linux, whose file system is absolutely filename case sensitive. So the snockets won’t be able to locate the file due to the case issue.

To resolve the bug, I renamed my file in RubyMine, then try to commit in terminal.
But when I commit, I met another very interesting issue, that git says there is no file changed, so it refused to commit.
It is still the same issue, due to FS is case insensitive, git cannot detect the renaming.

This problem is more common when coding on Windows, but CI or production runs on Linux. And the very common solution is to pull the code in case sensitive environment, then rename the file and commit it.

But I found another more easier way to do it:

Using git mv in terminal to rename the file, which will enforce git to track the file renaming action.

Most of Git GUIs are able to track file name case changing, so you can try to commit the change with the tool, such as RubyMine or SourceTree.

2013-04-10

Programming►Regular Expression

Programming

Pitfall in matching line head and line end in regexp

I usually uses \^\ and \$\ to verify user input, e.g:
I uses following regexp to verify whether a user input is valid gmail email address:

Matching Gmail

1	^[a-zA-Z_\.]+@gmail.com$

But in fact it is potentially vulnerable!
According to the RegExp document, ^ and $ is matching to line head and line end!
So I might rush into pitfall when user try to fool me with following input:

bad input

1	"[email protected]\n<script>alert('bang!')</script>"

Since there is a \n in the string, so $ won’t really match to the end of the string but actually matched to the \n, then the whole string become a valid input, but actually it isn’t!

To avoid such issue, we should stick to \A and \z, which is literally means the the beginning of the string and end of the string!

2013-04-07

Practice►Continues Integration

Practice

Multiple Project Summary Reporting Standard - cctray xml feed

CCTray.xml is an RSS-like CI server build status xml feed, which is originally developed for CruiseControl.net.
ThoughtWorks declared it in a standard called “Multiple Project Summary Reporting Standard”, which now have become some kind of unofficial standard of CI server feed that is widely supported by all kind of popular CI servers.

You can find the feed as described below:

And according to cc_dashboard, there are some exceptions that are not included in the document.

An additional “Pending” activity
An additional “Unknown” status. I’ve seen Unknown reported by CruiseControl.rb when project builds are serialized (“Configuration.serialize_builds = true” set in .cruise/site_config.rb) and one build is waiting for another build to finish. I’ve seen Unknown reported by Hudson when a project is disabled.

The following is a patched version of Multiple Project Summary Reporting Standard.

Multiple Project Summary Reporting Standard

Introduction

Various Continuous Integration monitoring / reporting tools exist. Examples are:

These tools work by polling Continuous Integration servers for summary information and presenting it appropriately to users.

If a Continuous Integration server can offer a standard summary format, and a reporting tool can consume the same, then we get interoperability between reporting tools and CI Servers.

Description

Summary information will be available as a plain XML string retrievable through an http GET request.

The format of the XML will be as follows:

Summary

A single node, the document root, which contains 0 or many node.

Each may have the following attributes:

name	description	type	required
name	The name of the project	string	yes
activity	The current state of the project	string enum : Sleeping, Building, CheckingModifications	yes
lastBuildStatus	A brief description of the last build	string enum : Pending, Success, Failure, Exception, Unknown	yes
lastBuildLabel	A referential name for the last build	string	no
lastBuildTime	When the last build occurred	DateTime	yes
nextBuildTime	When the next build is scheduled to occur (or when the next check to see whether a build should be performed is scheduled to occur)	DateTime	no
webUrl	A URL for where more detail can be found about this project	string (URL)	yes

Clients that consume this XML should not rely on any optional attribute being present, and should degrade their functionality gracefully.

Example

CCTray.xml Sample

<Projects>
    <Project
        name="SvnTest"
        activity="Sleeping"
        lastBuildStatus="Exception"
        lastBuildLabel="8"
        lastBuildTime="2005-09-28T10:30:34.6362160+01:00"
        nextBuildTime="2005-10-04T14:31:52.4509248+01:00"
        webUrl="http://mrtickle/ccnet/"/>
    <Project
        name="HelloWorld"
        activity="Sleeping"
        lastBuildStatus="Success"
        lastBuildLabel="13"
        lastBuildTime="2005-09-15T17:33:07.6447696+01:00"
        nextBuildTime="2005-10-04T14:31:51.7799600+01:00"
        webUrl="http://mrtickle/ccnet/"/>
</Projects>

Schema

cctray.xml Schema

<?xml version="1.0" encoding="UTF-8" ?>
<xs:schema elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="Projects">
<xs:complexType>
<xs:sequence>
<xs:element name="Project" maxOccurs="unbounded">
<xs:complexType>
<xs:attribute name="name" type="xs:NMTOKEN" use="required" />
<xs:attribute name="activity" use="required">
<xs:simpleType>
<xs:restriction base="xs:NMTOKEN">
<xs:enumeration value="Sleeping" />
<xs:enumeration value="Building" />
<xs:enumeration value="CheckingModifications" />
</xs:restriction>
</xs:simpleType>
</xs:attribute>
<xs:attribute name="lastBuildStatus" use="required">
<xs:simpleType>
<xs:restriction base="xs:NMTOKEN">
<xs:enumeration value="pending"/>
<xs:enumeration value="Exception"/>
<xs:enumeration value="Success"/>
<xs:enumeration value="Failure"/>
<xs:enumeration value="Unknown"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
<xs:attribute name="lastBuildLabel" type="xs:NMTOKEN" use="required" />
<xs:attribute name="lastBuildTime" type="xs:dateTime" use="required" />
<xs:attribute name="nextBuildTime" type="xs:dateTime" use="optional" />
<xs:attribute name="webUrl" type="xs:string" use="required" />
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>

ThoughtWorkshop

Digital Bigs in my thought

Node over Express - Autoload

Preface

Pain Point

Rails Approaches

External Resources

Internal Resources

Autoload in Node.js

Node.js Module System

Global Variable

JS Property

My Implementation

Node over Express - Configuration

Preface

Background

Issue 1: Confidential Values

Issue 2: Dynamic Data

Issue 3: Complicated Inheritance Relationship

Some Comments

My Solution

Mac OS X case-insensitive file system pitfall

Pitfall in matching line head and line end in regexp

Multiple Project Summary Reporting Standard - cctray xml feed

Multiple Project Summary Reporting Standard

Introduction

Description

Summary

Example

Schema