Published: 2011-06-18
Tagged: coffee-script, map-reduce, nosql, riak

Writing and queering MapReduce requests for riak by hand is rather cumbersome. My little tool riakqp provides a simple, fast and elegant way for prototyping those requests.

riakqp - Riak Query Prototyping in CoffeeScript Syntax

Writing and queering MapReduce requests for riak by hand is somewhat cumbersome. A JSON formatted request is sent with the command line tool curl. This procedure includes the specification of the header and the HTTP-verb; see e.g. the given examples in Loading Data and Running MapReduce Queries. However, the most inconvenient part is to include the map and reduce functions into the query. See this query for a non-trivial example.

riakqp lets you specify external CoffeScript files in your query. Instead of including JavaScript inline like

"source" : "(function(v) { var freq; ...

You will reference a source file as in

"source_file" : "word_map.coffee"

A short riakqp Tutorial

Prerequisites

I assume that you have a *nix like operating system with basic tools including the curl command line program. You will need node and npm. Install riakqp with npm install -g riakqp. I recommend installing jsonprettify in the same way.

Running example

In the following example, we will count the frequency of words in the book Alice in Wonderland. The source code of riakqp includes the demo/alice-in-wonderland directory; download that one!

The page MapReduce on wiki.basho.com features a similar example, which is written in pure JavaScript. Note that their code works in a somewhat different manner compared to our example!

Setup

Apparently you will need a riak cluster. You might want to follow Building a Development Environment for an initial setup. I assume that one node of your cluster is accessible via http://localhost:8091.

Data upload

Go into the demo/alice-in-wonderland/data directory. Invoke ./put which should upload each chapter of the book to your riak cluster. Confirm this by queering curl -v localhost:8091/riak/alice?keys=true. The response should contain "keys" : [ "cp2.txt" , "cp7.txt" , "cp8.txt" and so forth. You can also retrieve the first chapter by requesting http://localhost:8091/riak/alice/cp1.txt in your browser.

MapReduce Query

Now, go up one directory into demo/alice-in-wonderland. The files query.json, word_map.coffee, and reduce_word_count.coffee specify our query. We invoke riakqp -q query.json | jsonprettify | grep alice to run the query. This should yield the following result: "alice" : 362.

Going further

There is a second demonstration in the demo/google-stock directory. It contains the same dataset as used in Loading Data and Running MapReduce Queries in the Riak Fast Track tutorial. This dataset is very well suited for exploration with your own queries. Have fun!