Top
Navigation
2016 Gannett Foundation Award for Technical Innovation in the Service of Digital Journalism finalist

ArchieML

 

About the Project

archieml is an open-source data format designed to make it easy for reporters, editors, coders and visual journalists to write, edit and collaborate on content for multimedia stories. The format tries to use as little markup as possible so that the text portion of documents can be read and edited without wading through a lot of special characters. archieml also tries to make it easier to enter and include semi-structured data (think xml or json) alongside or inserted into sections of prose.

archieml is intended to be a much more forgiving format than other formats for semi-structured data. In the json and xml examples above, the data object quickly breaks if you forget a comma, slash, quote mark or angle bracket.

Instead, archieml tries to use as little punctuation as possible to make it much harder to accidentally break the data object by deleting one of those special characters. This is a critical requirement in a fast-moving newsroom where we want people with no coding experience to be able to input and edit story data directly.

We introduced the spec in March last year and made our last significant change on June 2 when we added the “freeform” object. The freeform object allows users to easily create and edit multimedia stories by treating the story as one long array in which the majority of the text can be written with no encoding at all.

We wrote two open-source parsers that translate archieml using Javascript and using Ruby. Since we introduced the format, many other users have created open-source archieml parsers for other languages like R, Python, .Net, Scala and Clojure.

Here are examples of how we’ve used it:
archieml is being used in newsrooms in the United States and around the world, including: