XmlConfigFile Tutorial

by Maik Schmidt


Introduction

A lot of modern software can be customized by configuration files and more and more applications use XML as the format for these configuration files. This makes sense because of at least the following reasons:

  • XML files can be edited easily and it will become even easier in the future.
  • XML provides everything you need in typical configuration files: hierarchical data, comments, ...
  • XML can be processed easily by a lot of modern tools and nearly all programming languages.
  • Usually, accessing a configuration file is not performance critical, because many configuration parameters are read only once.

So, creating and reading XML configuration files seems to be easy, but what about accessing the content of such a file? Many applications use "DOM tree traversing" or convert the XML document into a simpler internal structure (e.g. Hashes). Some advantages of XML get lost by doing so. If you want to change and write back a configuration, for example, you will have to write code that converts your internal structure back to XML.

But there is a better, easier, and standardized way for accessing elements in an XML document: XPath. With XmlConfigFile you can access configuration parameters via XPath expressions. This tutorial will show you how to do this.

Installation

You can download XmlConfigFile here. XmlConfigFile depends on REXML, so you will have to install it first. Then run

            ruby install.rb config
            ruby install.rb setup
            ruby install.rb install

A Simple Example

For our first example, we assume, that we have a configuration file called example.xml that looks like this:

            <!--
              A sample configuration file.
            -->

            <?xml version="1.0" encoding="iso-8859-1"?>

            <config>
              <version>1.7</version>
              <splash-screen enabled='yes' delay='5000' />
              <greeting lang="en">Hello, world!</greeting>
              <greeting lang="de">Hallo, Welt!</greeting>
              <base-dir>${BASEDIR}</base-dir>
              <db env="test">
                Standard connection.
                <name>addresses</name>
                <user role="admin">scott</user>
                <pwd>tiger</pwd>
                <host>example.com</host>
                <driver>
                  <vendor>MySql</vendor>
                  <version>3.23</version>
                </driver>
              </db>
              <db env="prod">
                <name>addresses</name>
                <user>production</user>
                <pwd>secret</pwd>
                <host>example.com</host>
                <driver>
                  <vendor>Oracle</vendor>
                  <version>8.1</version>
                </driver>
              </db>
            </config>

To load and parse this file, you have to do the following:

            require 'xmlconfigfile'

            config = XmlConfigFile.new('example.xml')

Now you can access all the configuration file's entries via XPath. To get the content of the version element as a String, simply call

            version = config.get_parameter('/config/version') # -> '1.7'
or even shorter
            version = config['/config/version'] # -> '1.7'

To get the version element as float value, call

            version = config.get_float_parameter('/config/version') # -> 1.7
This works similar for integer values:
            splash_delay = config.get_int_parameter('/config/splash-screen/@delay') # -> 5000
Of course, all Ruby literals for integer and float values (hex, octal, exponential notation, etc.) are supported.

Boolean values are a bit different. The following table shows, which values by default mean true respectively false in configuration files handled by XmlConfigFile:

true false
1 0
yes no
on off
true false
It doesn't matter, if they occur as element or as attribute values and both whitespace and case will be ignored.
            splash_enabled = config.get_boolean_parameter('/config/splash-screen/@enabled') # -> true

If you want to provide your own values for true and false, just do this:

            config.true_values  = ["HIja'", "HISlaH"]
            config.false_values = ["ghobe'"]
Now, the Klingon phrases HIja' and HISlaH mean true and ghobe' means false. Of course, whitespace and case will still be ignored.

Configuration files do often contain different versions of configuration parameters, for example for different countries, for different languages, or for different environments. With XPath, it's simple to keep them all in a single configuration file. If you want to get the german version of our friendly greeting element, just call

            greeting = config.get_parameter("/config/greeting[@lang='de']") # -> 'Hallo, Welt!'

To get the name of your production database user, call

            user = config.get_parameter("/config/db[@env='prod']/user") # -> 'production'

Working with Lists of Parameters

You will often need a bunch of related configuration parameters at the same time. Therefore the get_parameters method does exist. It converts a node list into a Hash. The keys of this Hash are the paths to the single elements, where the tag names are separated by the '.' character by default. The root element (config in our case) will be excluded.

So, to get all database configuration parameters for your test environment, you will have to call

            dbParams = config.get_parameters("/config/db[@env='test']/*")

The resulting Hash looks like this:

            dbParams = {
              db.driver.vendor => 'MySql',
              db.driver.version => '3.23',
              db.host => 'example.com',
              db.pwd => 'tiger',
              db.user => 'scott',
              db.name => 'addresses'
            }

If you want to expand attributes, too, you have to do the following:

            @config.expand_attributes = true
Now dbParams = config.get_parameters("/config/db[@env='test']/*") returns:
            dbParams = {
              db.driver.vendor => 'MySql',
              db.driver.version => '3.23',
              db.host => 'example.com',
              db.pwd => 'tiger',
              db.user => 'scott',
              db.user.role => 'admin',     # Yes! Attributes will be returned, too!
              db.name => 'addresses'
            }

By default, the single path elements will be separated by a '.' character. If you want to, you can specify an arbitrary String as path separator:

            dbParams = config.get_parameters("/config/db[@env='test']/*", "-silly-")

This will result in the following Hash:

            dbParams = {
              db-silly-driver-silly-vendor => 'MySql',
              db-silly-driver-silly-version => '3.23',
              db-silly-host => 'example.com',
              db-silly-pwd => 'tiger',
              db-silly-user => 'scott',
              db-silly-user-silly-role => 'admin',
              db-silly-name => 'addresses'
            }

To convert your whole configuration file into a Hash call:

            hashConfig = config.get_parameters('//*')
But be careful: In the example configuration file above, many elements have the same 'path name', e.g. there are two elements that will be converted to 'db.user'. Only the last entry will survive! Also note, that it is not possible to access "orphaned" text nodes, e.g. the text 'Standard connection.' in the db element of our example configuration file will be ignored.

If you really need access to a bunch of elements sharing the same name, you should try the following:

            dbParams = config.get_parameter_array("/config/db")
The result looks like this:
            dbParams = [
              { 
                'db.name' => 'addresses',
                'db.user' => 'scott',
                'db.pwd' => 'tiger',
                'db.host' => 'example.com',
                'db.driver.vendor' => 'MySql',
                'db.driver.version' => '3.23'
              },
              {
                'db.name' => 'addresses',
                'db.user' => 'production',
                'db.pwd' => 'secret',
                'db.host' => 'example.com',
                'db.driver.vendor' => 'Oracle',
                'db.driver.version' => '8.1'
              }
            ]

Advanced Features

In addition to the features described in the last section, you will find some advanced features, like referencing environment variables from your configuration files or an automatic reloading mechanism.

Using Environment variables

It is often useful to combine the usage of configuration files and environment variables. XmlConfigFile makes this task easy: You can put references to environment variables into your configuration files and they will get expanded to their actual values as the file is loaded. The syntax for such references is

            ${Name of environment variable}

So, if you have an environment variable, that specifies a base directory as defined in our example configuration file, you can use it like this:

            baseDir = config.get_parameter('/config/base-dir') # -> Current value of $BASEDIR

Reloading the configuration periodically

It is often convenient, if you can reconfigure a running system without stopping and restarting it. XmlConfigFile supports such a mechanism. Simply provide the length of the reload period (measured in seconds) while creating a new object of class XmlConfigFile:

            config = XmlConfigFile.new('example.xml', 300)
The configuration file 'example.xml' will be checked for changes every five minutes now. If the file's modification timestamp has changed, it will be reloaded automatically. If the modified file is invalid or does not exist any longer, the last working version will be used and an error message will be sent to STDERR.

Every time a configuration file is reloaded, references to environment variables will be replaced by their actual values. Please note, that a configuration file only will be reloaded, if the modification timestamp of the file was changed. So, if you only change the environment, nothing will happen until you touch the file.

Further Reading

If you are interested in absolute truth, you will have to look at the source code or the API docs.

Acknowledgements

A big "Thank you!" (in no particular order) goes to

  • Yukihiro Matsumoto for Ruby.
  • Frank Tewissen for the Java reference implementation.
  • Sean Russell for REXML.
  • Dave Thomas for Rdoc.
  • Nathaniel Talbott for Test::Unit.
  • Minero Aoki for his setup package.
  • Sandra Silcot for tests and bug fixes.
  • Nigel Ball for contributing code.

Contact

If you have any suggestions or want to report bugs, please contact me (contact@maik-schmidt.de).


Copyright © 2003 by Maik Schmidt.