The content of this work created by myself is licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License
Scripts, images and other contents not created by myself but used in this talk are released under their respective original license.
Big Table, Google
Dynamo, Amazon
SimpleDB, Amazon
Azure Table Storage, Microsoft
....
CouchDB (Apache, CouchOne)
Riak (Basho)
Hadoop / HBase (Apache)
Cassandra (Facebook)
Proj. Voldemort (LinkedIn)
MongoDB (10gen)
Redis (VMware)
FlockDB (Twitter)
....
huge space
in many respects not novel at all
NoSQL: not a good term
today: Distributed (aka. Cloud~) and Document Oriented
network partitions and the CAP theorem
Map-Reduce
reliability
inner workings
documents
REST-API
Map-Reduce
application architecture
Each node in a system should be able to make decisions purely based on local state. If you need to do something under high load with failures occurring and you need to reach agreement, you’re lost. If you’re concerned about scalability, any algorithm that forces you to run agreement will eventually become your bottleneck. Take that as a given.
Werner Vogels, Amazon CTO and Vice President
you can achieve any two but no more of
|
|
Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services.
Nancy Lynchand Seth Gilbert. ACM SIGACT 2002
Towards Robust Distributed Systems
ACID
vs
BASE
Keynote, ACM Symposium on the Principles of Distributed Computing (PODC) 2000
Dr. Eric A. Brewer
Availability is not an option:
→ Consistency & Availability & Partition Tolerance
→ Consistency & Availability & Partition Tolerance
the (two) Model(s) in the CAP proof are really too simple
you can consistently read but not write
temporary non availability and latencies are an option
the concept of time and duration makes any sens full model non trivial
→ things are complicated
→ BASE and Eventually Consistent properties
→ many distributed NoSQL DBMSes have options to tweak
Traditional | NoSQL MR Cloud | |
---|---|---|
Design | Vertical | Horizontal |
Example | SAN + Rel. DB + JEE | Commodity PCs + Riak + Map-Reduce |
Cost | 20.000 total | 70 / Unit |
Req. Units | - | 3 |
Reliability | 99% total | 95% / Unit |
How many commodity PCs are required to achieve equivalent reliability?
3 let random = new System.Random() 4 let systems = 5 5 let Reliability = 95 // per cent, pr = 0.95 7 let RequiredSystems = 3 9 15 [ for i = 1 to systems do yield random.Next(100) ] 16 |> List.map( fun x -> if x < Reliability then 1 else 0) 17 |> List.reduce( + )
15: generates list of random integers, range 0..99 e.g. [ 5, 56, 98, 3, 80 ]
16: maps according to reliability (0..94: running, 95..99 failure) e.g. [1, 1, 0, 1, 1 ]
17: reduces list to integer (number of running systems) , e.g. 4
MapReduce: Simplified Data Processing on Large Clusters"
Jeffrey Dean and Sanjay Ghemawat, Google Labs
"... Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines..."
3 let random = new System.Random() 4 let systems = 4 5 let Reliability = 95 // per cent, pr = 0.95 6 7 let RequiredSystems = 3 8 let SampleSize = int (System.Math.Pow( 10.0, 7.0)) 9 13 [ for i = 1 to SampleSize do 14 yield 15 [ for i = 1 to systems do yield random.Next(100) ] 16 |> List.map( fun x -> if x < Reliability then 1 else 0) 17 |> List.reduce( + ) 18 ] |> List.map( fun x -> if x >= RequiredSystems then 1 else 0) 19 |> List.reduce( + ) 20 |> ( fun sum -> float sum / float SampleSize)
Map
Map → Reduce
Map → Reduce → Map → Reduce
Map → Reduce → Reduce
1 module Simulation // How many cheap systems do you need? 2 3 let random = new System.Random() 4 let systems = 4 5 let Reliability = 95 // per cent, pr = 0.95 6 7 let RequiredSystems = 3 8 let SampleSize = int (System.Math.Pow( 10.0, 7.0)) 9 10 printfn "Simulation \nNr. systems -> Reliability" 11 [ for systems = 1 to 10 do // shadows systems from above 12 yield 13 [ for i = 1 to SampleSize do 14 yield 15 [ for i = 1 to systems do yield random.Next(100) ] 16 |> List.map( fun x -> if x < Reliability then 1 else 0) 17 |> List.reduce( + ) 18 ] |> List.map( fun x -> if x >= RequiredSystems then 1 else 0) 19 |> List.reduce( + ) 20 |> ( fun sum -> float sum / float SampleSize) 21 ]|> List.iteri( fun i f -> ( printfn "%2d -> %3.2f" (i+1) (f*100.)))
1 module Exact // How many cheap systems do you need? 2 3 open Checked // possibly big numbers here 4 5 // pr : probability runns (reliability) 6 // prs: reliability system, r : # req. systems, n: # systems 7 let pr prs r n = 8 9 // not tail recursive & doesn't memoize !!! 10 let rec factorial x = 11 match x with 12 | 0 -> 1 // 0 corner case 13 | x -> x * factorial (x-1) 14 15 let pow a b = System.Math.Pow(a, (float b)) // just convenience 16 17 let choose n k = float ((factorial n) / ((factorial k) * (factorial(n-k)))) 18 19 if r > n then 0. 20 else [ for k = r to n do yield choose n k * pow (1.-prs) (n-k) * pow prs k ] 21 |> List.reduce(+) 22 23 printfn "Exact \nNr. systems -> Reliability" 24 [for systems = 1 to 10 do yield pr 0.90 3 systems ] 25 |> List.iteri( fun i f -> ( printfn "%2d -> %3.5f " (i+1) (f*100.) ) )
|
Conclusionsreliability with commodity hardware and horizontal scaling is possible in fact: we can achieve extreme reliability which is simply not affordable otherwise (traditional design and vertical scaling) |
|
|
→ for extremely available and reliable (not only network) services
→ diametrical opposed to almost anything in the { C / C++ / Java / C# } - Languages
Actors in Scala, ...
Ericson, GitHub, Facebook, Twitterfall, Ubuntu, ...
CouchDB, Riak, Membase, SimpleDB, RabbitMQ, ...
more: |
![]() |
![]() |
{ "doc_type" : "wiki-page" , "title" : "Abstract Talk SBB - NoSQL" , "tags" : [ "couchdb" , "map-reduce" , "nosql" , "sbb" , "talk" ] , "times": { "created_at" : "2010-10-17T15:55:48.717Z" , "modified_at" : "2010-10-17T16:10:54.603Z" } , "content_source" : "CouchDB rocks \n..." , "content_format" : "markdown" , "private" : false }
nesting Arrays, Objects, and Key-Values; lightweight alternative to XML
2005 Ajax = "Asynchronous JavaScript and XML"
2010 Ajax = Ajax; JSON for the better part
|
|
|
B. Stroustrup, Bell Labs
check:
$ curl -X GET localhost:5984 {"couchdb" : "Welcome", "version" : "1.0.1"}
list databases:
$ curl localhost:5984/_all_dbs ["take-away","take-away_dev","_users"]
create database:
$ curl -X PUT localhost:5984/test {"ok":true}
list databases:
$ curl localhost:5984/_all_dbs ["take-away","take-away_dev","test","_users"]
list documents
$ curl -X GET localhost:5984/test/_all_docs {"total_rows":0,"offset":0,"rows":[]}
create document
$ curl -X PUT localhost:5984/test/myDoc -d "{}" {"ok":true,"id":"myDoc","rev":"1-967a00dff5e02add41819138abb3284d"}
retrieve the document
$ curl -X GET localhost:5984/test/myDoc { "_id": "myDoc" ,"_rev": "1-967a00dff5e02add41819138abb3284d" }
update document
$ curl -i -X PUT localhost:5984/test/myDoc -d '{"name":"Joe"}' HTTP/1.1 409 Conflict Server: CouchDB/1.0.1 (Erlang OTP/R13B) ... {"error":"conflict","reason":"Document update conflict."}
update requires revision, i.e. proof you update the most recent version
$ curl -i -X PUT localhost:5984/test/myDoc \ -d '{"_rev":"1-967a00dff5e02add41819138abb3284d","name":"Joe"}' HTTP/1.1 201 Created ... Etag: "2-f800182d81918dbfc80350577842757a" ...
Etags for caching well defined Resources GET, PUT, DELETE (and limited use of POST) for more RESTful architecture goodness: |
![]() |
---|
everything that supports HTTP (REST) and maps from/to JSON
you might not need a framework !
net/http (generic), CouchRest (close to the metal)
Rails 2 strong dependency on Active Record; a few frameworks,conventions gone ...
Sinatra, a better choice?
some hitches on JRuby and Iron-Ruby (due to gems of various JSON impl.)
baked-in as of Lift2 (haven't tried it myself!)
rocks!!!
Map-Reduce → CouchDB-view
views (and more) are stored in design documents
protected (writeable only for admins per default)
can have attachments and all other features of regular documents
example, design document test in the database take-away: localhost:5984/take-away/_design/test/
sub-object (JSON) of a design document
has one Map-function
and optionally one Reduce-function
Map-function:
(doc) -> emit(doc._id,1)
query
$ curl localhost:5984/take-away/_design/test/_view/docone
result
{"total_rows":237 ,"offset":0 ,"rows": [ {"id":"0024","key":"0024","value":1} , {"id":"020","key":"020","value":1} , {"id":"0374","key":"0374","value":1}, ....
Map-function in CoffeeScript
(doc) -> type = doc.doc_type emit(doc._id,1) if type? and type is "wiki-page"
compiled to JavaScript
(function(doc) { var type; type = doc.doc_type; if ((typeof type !== "undefined" && type !== null) && type === "wiki-page") { return emit(doc._id, 1); } });
Reduce-function
CoffeScript
(key, values, rereduce) -> sum = 0 for value in values sum += value sum |
JavaScript
(function(key, values, rereduce) { var _i, _len, _ref, sum, value; sum = 0; _ref = values; for (_i = 0, _len = _ref.length; _i < _len; _i++) { value = _ref[_i]; sum += value; } return sum; }); |
Result:
{"rows":[ {"key":null,"value":221} ]}
by the way, CouchDB has a integrated sum
-function ;-)
if you map to different types, (or you logic requires it) you must implement for the rereduce
case
static
B-Tree indexing is enforced
less dynamic, in many ways:
it is hard to "screw-up" performance
no dynamic code → (the equivalent of SQL-) JS-Code-Injection is impossible
map your document to different representations, e.g. HTML, JSON, ...
represent list of (non reduced) views, e.g. index of your blog (again HTML, ATOM, JSON, ...)
![]() |
Map-Reduce, List- and Show-functions let you build 2Tier applications
|
|
|
all that runs in O(n log n) and that fits in memory is not a problem (performance)
performance problems arise from locking, memory hierarchy effects, remote calls and in particular from blocking (remote calls) and IO
separation of data and it's analysis is wrong; essentially this is why Map-Reduce is so successful
there is no way we will get locking and semaphores right, too difficult
multi threading is wrong
→ light processes, event-loop, none blocking IO, continuations, and messaging
Erlang, Scala, F# have plenty of the good stuff
NodeJS is a server-side runtime based on V8, it has almost all of the good stuff, almost nothing of the bad; and it is simple!
integrated document index and search engine (do you want to run lucene on your android phone?)
something like couchapp with integrated testing, and a less alien environment: maybe based on NodeJS and Jasmine
framework support for the middle layer in 3Tier-applications
automatic sharding, e.g. see Riak and many others
more dynamic queries, see MongoDB and many others
April 1st 2010, Mu Dynamics Research Labs