Disclaimer

Creative Commons License
The content of this work created by myself is licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License

Scripts, images and other contents not created by myself but used in this talk are released under their respective original license.

The views and opinions expressed in this talk do not necessarily represent those of my employer.

Introduction - Out There

Behind the Curtain, as a Service

Big Table, Google

Dynamo, Amazon

SimpleDB, Amazon

Azure Table Storage, Microsoft

....

At your Hands

CouchDB (Apache, CouchOne)

Riak (Basho)

Hadoop / HBase (Apache)

Cassandra (Facebook)

Proj. Voldemort (LinkedIn)

MongoDB (10gen)

Redis (VMware)

FlockDB (Twitter)

....

more: nosql-database.org
Introduction - Facets / Classification
  • Wide Column Store / Column Families
  • Document Store
  • Key Value / Tuple Store
  • Graph Databases
  • Object Databases
  • XML Databases
  • ...

huge space

in many respects not novel at all

NoSQL: not a good term

today: Distributed (aka. Cloud~) and Document Oriented

Overview

Part I: Distributed Data

network partitions and the CAP theorem

Map-Reduce

reliability

Part II: CouchDB

inner workings

documents

REST-API

Map-Reduce

application architecture

CAP - Theorem

CAP Theorem - Prelude

Each node in a system should be able to make decisions purely based on local state. If you need to do something under high load with failures occurring and you need to reach agreement, you’re lost. If you’re concerned about scalability, any algorithm that forces you to run agreement will eventually become your bottleneck. Take that as a given.

Werner Vogels, Amazon CTO and Vice President

CAP Theorem - Network
CAP Theorem - Node Failure
CAP Theorem - Network Partitioning

what now ?

CAP Theorem
you can achieve any two but no more of
  • Consistensy of Data
  • Availability of Service
  • Resiliance to Network Partitioning

Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services.

Nancy Lynchand Seth Gilbert. ACM SIGACT 2002

CAP Theorem - Brewers Conjecture

Towards Robust Distributed Systems

ACID

vs

BASE

BASE:

  • Basically Available
  • Soft-state
  • Eventual consistency

Keynote, ACM Symposium on the Principles of Distributed Computing (PODC) 2000

Dr. Eric A. Brewer

CAP Theorem - Critics

Availability is not an option:

→ Consistency & Availability & Partition Tolerance

Consistency & Availability & Partition Tolerance

the (two) Model(s) in the CAP proof are really too simple

you can consistently read but not write

temporary non availability and latencies are an option

the concept of time and duration makes any sens full model non trivial

→ things are complicated

→ BASE and Eventually Consistent properties

→ many distributed NoSQL DBMSes have options to tweak

Map-Reduce & Risk Analysis
Map-Reduce: Business Case & Risk Analysis
Traditional NoSQL MR Cloud
DesignVertical Horizontal
ExampleSAN + Rel. DB + JEE Commodity PCs + Riak + Map-Reduce
Cost 20.000 total 70 / Unit
Req. Units - 3
Reliability 99% total 95% / Unit

How many commodity PCs are required to achieve equivalent reliability?

Map-Reduce: Monte-Carlo Analysis
 3 let random = new System.Random()
 4 let systems = 5
 5 let Reliability = 95 // per cent, pr = 0.95
 7 let RequiredSystems = 3
 9 
15           [ for i = 1 to systems do  yield random.Next(100) ]
16             |> List.map( fun x -> if x < Reliability then 1 else 0)
17             |> List.reduce( + )

15: generates list of random integers, range 0..99 e.g. [ 5, 56, 98, 3, 80 ]

16: maps according to reliability (0..94: running, 95..99 failure) e.g. [1, 1, 0, 1, 1 ]

17: reduces list to integer (number of running systems) , e.g. 4

Map-Reduce & Functional Programming

"Our abstraction is inspired by the map and reduce primitives present in Lisp and many other functional languages."

MapReduce: Simplified Data Processing on Large Clusters"

Jeffrey Dean and Sanjay Ghemawat, Google Labs

"... Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines..."

Functional Programming → Map-Reduce

Map-Reduce: Cascade
 3 let random = new System.Random()
 4 let systems = 4
 5 let Reliability = 95 // per cent, pr = 0.95
 6 
 7 let RequiredSystems = 3
 8 let SampleSize = int (System.Math.Pow( 10.0, 7.0))
 9 
13     [ for i = 1 to SampleSize do
14         yield
15           [ for i = 1 to systems do  yield random.Next(100) ]
16             |> List.map( fun x -> if x < Reliability then 1 else 0)
17             |> List.reduce( + )
18     ] |> List.map( fun x -> if x >= RequiredSystems then 1 else 0)
19       |> List.reduce( + )
20       |> ( fun sum -> float sum / float SampleSize)

Map

Map → Reduce

Map → Reduce → Map → Reduce

Map → Reduce → Reduce

Map-Reduce: Monte-Carlo Analysis in F#
 1 module Simulation // How many cheap systems do you need? 
 2 
 3 let random = new System.Random()
 4 let systems = 4
 5 let Reliability = 95 // per cent, pr = 0.95
 6 
 7 let RequiredSystems = 3
 8 let SampleSize = int (System.Math.Pow( 10.0, 7.0))
 9 
10 printfn "Simulation \nNr. systems -> Reliability"
11 [ for systems  = 1 to  10 do // shadows systems from above
12   yield
13     [ for i = 1 to SampleSize do
14         yield
15           [ for i = 1 to systems do  yield random.Next(100) ]
16             |> List.map( fun x -> if x < Reliability then 1 else 0)
17             |> List.reduce( + )
18     ] |> List.map( fun x -> if x >= RequiredSystems then 1 else 0)
19       |> List.reduce( + )
20       |> ( fun sum -> float sum / float SampleSize)
21 ]|> List.iteri( fun i f -> ( printfn "%2d ->  %3.2f" (i+1) (f*100.)))
  • not necessarily a good example (script, not structured)
  • slow! exec. time more than 1 minute:
    • Lists (maybe)
    • accuracy line 8 and 21 !
    • random line 15
Exact Solution
Map-Reduce: Exact F#
 1 module Exact // How many cheap systems do you need? 
 2 
 3 open Checked // possibly big numbers here
 4 
 5 // pr : probability runns (reliability)
 6 //      prs: reliability system,  r : # req. systems, n: # systems
 7 let pr prs r n =
 8 
 9   // not tail recursive & doesn't memoize !!!
10   let rec factorial x =
11       match x with
12       | 0 -> 1 // 0 corner case
13       | x -> x * factorial (x-1)
14 
15   let pow a b = System.Math.Pow(a, (float b)) // just convenience
16 
17   let choose n k = float ((factorial n) / ((factorial k) * (factorial(n-k))))
18 
19   if r > n then 0.
20   else  [ for k = r to n do yield choose n k * pow (1.-prs) (n-k) * pow prs k ]
21         |> List.reduce(+)
22 
23 printfn "Exact \nNr. systems -> Reliability"
24 [for systems = 1 to 10 do yield pr 0.90 3 systems ]
25 |> List.iteri( fun i f -> ( printfn "%2d ->  %3.5f " (i+1) (f*100.) ) )
Map-Reduce: Closing the Business-Case

number

systems

reliability

1 0.0000000
2 0.0000000
3 85.7375000
4 98.5981250
5 99.8841875
6 99.9913594
7 99.9993973
8 99.9999599
9 99.9999974
10 99.9999998

Conclusions

reliability with commodity hardware and horizontal scaling is possible

in fact: we can achieve extreme reliability which is simply not affordable otherwise (traditional design and vertical scaling)

Part II - CouchDB

COUCH

= Cluster Of Unreliable Commodity Hardware.

CouchDB - Overview

  • A document database server, accessible via a RESTful JSON API.
  • Ad-hoc and schema-free with a flat address space.
  • Distributed, featuring robust, incremental replication with bi-directional conflict detection and management.
  • Query-able and index-able, featuring a table oriented reporting engine that uses JavaScript as a query language.
Intermezzo - Erlang

Language, Architecture & Principles

  • Syntax, Appearance ← Prolog
  • (almost pure) functional
  • no objects at all
  • lightweight processes, no threads, no shared memory
  • process messaging
  • non defensive programming, fail fast and let it crash philosophy
  • hot-swapping

→ for extremely available and reliable (not only network) services

→ diametrical opposed to almost anything in the { C / C++ / Java / C# } - Languages

Intermezzo - Erlang II

Influential

Actors in Scala, ...

Used at

Ericson, GitHub, Facebook, Twitterfall, Ubuntu, ...

in Projects/Products:

CouchDB, Riak, Membase, SimpleDB, RabbitMQ, ...

more:
CouchDB: Documents and JSON

JSON - JavaScript Object Notation

{ "doc_type" : "wiki-page"
, "title" : "Abstract Talk SBB - NoSQL"
, "tags" :
    [ "couchdb"
    , "map-reduce"
    , "nosql"
    , "sbb"
    , "talk"
    ]
, "times":
    { "created_at" : "2010-10-17T15:55:48.717Z"
    , "modified_at" : "2010-10-17T16:10:54.603Z"
    }
, "content_source" : "CouchDB rocks \n..."
, "content_format" : "markdown"
, "private" : false
}

nesting Arrays, Objects, and Key-Values; lightweight alternative to XML

2005 Ajax = "Asynchronous JavaScript and XML"

2010 Ajax = Ajax; JSON for the better part

Objects and Data-Stores

Rel. Database

ORM

big fat abstraction

Objects

Document Store

small abstraction

Objects

CouchDB, Riak, ...

Objects in JavaScript

OO & RDBMS: bad idea, don't use them together!

B. Stroustrup, Bell Labs

CouchDB - REST API

check:

$ curl -X GET localhost:5984
{"couchdb" : "Welcome", "version" : "1.0.1"}

list databases:

$ curl localhost:5984/_all_dbs
["take-away","take-away_dev","_users"]

create database:

$ curl -X PUT localhost:5984/test
{"ok":true}

list databases:

$ curl localhost:5984/_all_dbs
["take-away","take-away_dev","test","_users"]
CouchDB - REST API II

list documents

$ curl -X GET  localhost:5984/test/_all_docs
{"total_rows":0,"offset":0,"rows":[]}

create document

$ curl -X PUT localhost:5984/test/myDoc  -d "{}"
{"ok":true,"id":"myDoc","rev":"1-967a00dff5e02add41819138abb3284d"}

retrieve the document

$ curl -X GET  localhost:5984/test/myDoc
{ "_id": "myDoc"
,"_rev": "1-967a00dff5e02add41819138abb3284d" }

update document

$ curl -i -X PUT localhost:5984/test/myDoc  -d '{"name":"Joe"}'
HTTP/1.1 409 Conflict
Server: CouchDB/1.0.1 (Erlang OTP/R13B)
...
{"error":"conflict","reason":"Document update conflict."}
CouchDB - REST API III

update requires revision, i.e. proof you update the most recent version

$ curl -i -X PUT localhost:5984/test/myDoc \
  -d '{"_rev":"1-967a00dff5e02add41819138abb3284d","name":"Joe"}'

HTTP/1.1 201 Created
...
Etag: "2-f800182d81918dbfc80350577842757a"
...

Etags for caching

well defined Resources

GET, PUT, DELETE

(and limited use of POST)

for more RESTful architecture goodness:

Language Support and Frameworks

everything that supports HTTP (REST) and maps from/to JSON

you might not need a framework !

Ruby, Ruby on Rails

net/http (generic), CouchRest (close to the metal)

Rails 2 strong dependency on Active Record; a few frameworks,conventions gone ...

Sinatra, a better choice?

some hitches on JRuby and Iron-Ruby (due to gems of various JSON impl.)

Lift / Scala

baked-in as of Lift2 (haven't tried it myself!)

NodeJS

rocks!!!

CouchDB: Map-Reduce
Map-Reduce-functions are writen in

JavaScript

JavaScript the Misunderstood

The (Very) Good

  • Prototypal Inheritance ← Self, (2002 IO)
  • Functional ← Scheme, Lisp

The Bad

  • Syntax, Appearance ← { } - Languages
  • glued together very hastely

The Hideous

  • The DOM (often mistaken with the language itself)

most prevalent programming language today

JavaScript - Mitigating The Bad
  • understand the concepts

and

  • follow strict conventions
  • stay away from the bad parts
  • use JSLint
  • on the browser, stay off the DOM: JQuery, YUI, ...

consider to use CoffeScript

  • gets compiled to readable JavaScript which passes JSLint
  • clear syntax
  • concise and lots of syntactic sugar
  • (almost) no bad parts
  • forget about conventions
CouchDB - MapReduce

Map-Reduce → CouchDB-view

views (and more) are stored in design documents

Design Documents

protected (writeable only for admins per default)

can have attachments and all other features of regular documents

example, design document test in the database take-away:
localhost:5984/take-away/_design/test/

View

sub-object (JSON) of a design document

has one Map-function

and optionally one Reduce-function

CouchDB - MapReduce

Map-Functions

example "docone":

Map-function:

(doc) ->
  emit(doc._id,1)

query

$ curl localhost:5984/take-away/_design/test/_view/docone

result

{"total_rows":237
,"offset":0
,"rows":
  [ {"id":"0024","key":"0024","value":1}
  , {"id":"020","key":"020","value":1}
  , {"id":"0374","key":"0374","value":1},
....
CouchDB - MapReduce

Map-Reduce Example Counting Wiki-Pages

Map-function in CoffeeScript

(doc) ->
  type = doc.doc_type
  emit(doc._id,1) if type? and type is "wiki-page"

compiled to JavaScript

(function(doc) {
  var type;
  type = doc.doc_type;
  if ((typeof type !== "undefined" && type !== null) && type === "wiki-page") {
    return emit(doc._id, 1);
  }
});
CouchDB - MapReduce

Reduce-function

CoffeScript
(key, values, rereduce) ->
  sum = 0
  for value in values
    sum += value
  sum
JavaScript
(function(key, values, rereduce) {
  var _i, _len, _ref, sum, value;
  sum = 0;
  _ref = values;
  for (_i = 0, _len = _ref.length; _i < _len; _i++) {
    value = _ref[_i];
    sum += value;
  }
  return sum;
});

Result:

{"rows":[
  {"key":null,"value":221}
]}

by the way, CouchDB has a integrated sum-function ;-)

if you map to different types, (or you logic requires it) you must implement for the rereduce case

CouchDB - MapReduce

CouchDB Views ≠ SQL Views (resp. Queries)

static

B-Tree indexing is enforced

Consequences

less dynamic, in many ways:

  • index is build on first query: slow, O(n log n) with lots Disk-IO
  • then updated: fast, O(log n) and very little Disk-IO, see B+-trees

it is hard to "screw-up" performance

no dynamic code → (the equivalent of SQL-) JS-Code-Injection is impossible

Other

Show Functions

map your document to different representations, e.g. HTML, JSON, ...

List Functions

represent list of (non reduced) views, e.g. index of your blog (again HTML, ATOM, JSON, ...)

3Tier vs 2Tier Architecture

Map-Reduce, List- and Show-functions let you build 2Tier applications

Rel. Database

(Oracle)

Application Logic

(JEE)

Fat-Client

(Eclipse RCP)

CouchDB

Logic

(Lift2, RoR, NodeJS)

Browser

CouchDB

Ajax

Browser

Opinionated Ramblings

We did it wrong!

all that runs in O(n log n) and that fits in memory is not a problem (performance)

performance problems arise from locking, memory hierarchy effects, remote calls and in particular from blocking (remote calls) and IO

separation of data and it's analysis is wrong; essentially this is why Map-Reduce is so successful

there is no way we will get locking and semaphores right, too difficult

multi threading is wrong

→ light processes, event-loop, none blocking IO, continuations, and messaging

Solutions

Erlang, Scala, F# have plenty of the good stuff

NodeJS is a server-side runtime based on V8, it has almost all of the good stuff, almost nothing of the bad; and it is simple!

Missing-Parts

Real Pain Points

integrated document index and search engine (do you want to run lucene on your android phone?)

something like couchapp with integrated testing, and a less alien environment: maybe based on NodeJS and Jasmine

framework support for the middle layer in 3Tier-applications

What might be missed for particular use cases

automatic sharding, e.g. see Riak and many others

more dynamic queries, see MongoDB and many others

read this:

Why NoSQL is bad for startups

April 1st 2010, Mu Dynamics Research Labs

The End

thank you