MongoDB Performance & Durability

Posted by mikeal on July 7, 2010

CouchCamp

This post has been coming for some time now. I am in love with this new world of a thousand databases. Whatever your use case is, there is a database for you. In the old days you figured out how to keep up MySQL or Postgres and that was the hammer you used to bang in every screw. Forget screwdrivers, that’s another thing you’d have to learn and it took you a long time just to figure out this hammer. Which is why I’m not all that surprised when everyone finds their new shiny screwdriver and are trying to bang on all their nails with it.

The database world has spent a long time reassuring you that a database is a place for you to put things permanently and that they won’t corrupt or degrade over time if you do things right. Remember when memcached wasn’t a database? That’s because you don’t put things in to memcached permanently. Now we call memcached a database, not because it changed, but because how we evaluate software to determine if it’s a database has changed. Databases are more dynamic about where they sacrifice durability and consistency for performance and the old requirements on durability aren’t the test we use to determine if a database is good for our use case, instead we have to understand our use case and decide what we need.

I’m going to pick on MongoDB because they have a cavalier attitude toward durability, in the pursuit of perceived performance, that isn’t clear to most of the developers that 10gen is encouraging to drop existing durable solutions (mostly MySQL and Postgres). I’m a CouchDB guy and I work on CouchDB stuff for a living but I’m going to talk a lot about Redis and Cassandra and even some relational databases because they help tell a bigger story about performance and durability in the larger world of databases that you couldn’t understand by just looking at CouchDB and MongoDB or just Postgres and MongoDB. This is from me, not my employer, as a database geek.

Request/Response

Redis is an in-memory data structure server. Think of it like a place on the network to stick your global objects so that you can scale your application servers horizontally sharing their state in Redis. It’s incredibly fast, necessarily, to handle it’s use case properly. By default Redis returns a response once it’s finished putting your write in to memory, in other words once it’s accessible to be read. Optionally you can set Redis to only return after it’s written to it’s append-only log (by log I mean an internal transaction log, most SQL databases do something similar, more on this later).

Cassandra has a long list of options you can use for what kind of assurances you want when you do a write which you can set on the client. These allow you to say things like “only return a response after you’ve written this data to 3 other nodes”. This is one of the things that they mean by “fully consistent”, you have assurances that your data is persisted to multiple nodes in a cluster if you like.

CouchDB has two options. One returns responses as soon as the document is available and does an fsync() every second (delayed_commits). But, we encourage people to run in production with delayed_commits off. This means that all the pending writes are fsync()’d and when that fsync() returns any other pending writes are flushed (this is sometimes referred to as a “group commit”) and the responses to the clients are only returned after the fsync() is finished. Under concurrent load running with delayed_commits off has the same throughput as fsync()ing every minute, the only difference is that the responses to the clients might return 10-100ms after but it’s probably better to have on-disc assurances and in the real world a write delay to a client that is sub-second isn’t a huge deal. Relational databases have very similar “group commit” and “delayed commit” features in their fsync() strategy.

MongoDB, by default, doesn’t actually have a response for writes. You just write your data to the socket and assume it’s going to be available when you try to read it. Under concurrent load you can’t expect to reliably store stateful data like session information like this. It’s kind of like, in your webapp, if you were to spawn a thread to do some work and at the end set some global but you return a response to the client immediately. Where there wasn’t any load you would be fine because your thread is faster than the roundtrip time to the client. But under heavy load the operations in the thread could take too long and a subsequent request wouldn’t have access to that global because it wasn’t set yet. This is why you can’t store session data reliably in MongoDB without changing the default client option to return a response, because no matter how “fast” it is if you send a response to the client (or no response at all) before the data is available for read it’s useless.

To people who don’t live in databases all day it’s hard to explain just how odd this choice is. I don’t know of another database that even allows you to return a response before the write data is accessible much less not even have a response by default. This is kind of like using UDP for data that you care about getting somewhere, it’s theoretically faster but the importance of making sure your data gets somewhere and is accessible is almost always more important.

It’s no secret that people are hitting IO concurrency issues with traditional Python and Ruby web frameworks. Solutions to IO concurrency problems are gaining traction; erlang, node.js, nginx, EventMachine, Tornado, and all of these technologies use at least some non-blocking IO to limit the amount of overhead a connection has. Languages like Ruby and Python traditionally use threads, or in some cases system processes, which have a static per-connection cost taken out of the available resources (usually available memory). Once you hit the upper limit of how many connections you can have open at once you have to start limiting the amount of time each connection is open if you want to continue to increase your requests per second. Not waiting for a response from MongoDB cuts down the overall connection time. This must seem like a silver bullet for people scaling their Rails or Django app but it’s entirely possible that under load your users aren’t actually seeing the data they write or are having to hit refresh in order to finish their login a few seconds after the session data is actually available.

Durability

After many many years of database engineering most databases have come to the same conclusion. Using an append-only file is the preferred, sane and most assured way to handle data loss or corruption. This is hard for a lot of people to understand but it’s not just a possibility that a write to disc might fail, it’s a guarantee that some time in the life of the database it will fail. Programs crash, discs have issues, computers are not perfect machines, at some point in the life of your database there will be a problem.

SQL databases tend to keep an append-only “log”, a sequential list of every transaction on the database. This way it can always be replayed to recover from corruption. Redis keeps a similar log in an append-only file. CouchDB actually uses an append-only btree on disc for it’s entire database removing the need for a traditional “log”.

The trick to append-only file formats is to write a “header” to the end of the file after every operation. If a crash happens in the middle of a write you just find that last header and disregard everything after it. If you notice some corruption on disc it’s easy to isolate the space between operations. If you write to a file format “in place” instead of using append-only the complexity of tracking down corruption and invalid writes is mind boggling.

The catch with append-only is that you have to “compact” or “vacuum” from time to time. Since every delete and update operation is actually adding to the size of the file you’ll eventually want to write a new file with only the latest versions of the data dramatically reducing disc usage.

MongoDB does not keep an append-only log, and does not use an append-only file format for it’s db. It writes in place to it’s on disc format. They’ve stated that this is “faster” and that compaction is costly. The actual write being faster is a puzzling statement and I believe it to be entirely untrue. The advantage of writing to an append-only file is that all the writes are sequential which are significantly faster on spinning discs and even SSDs, seeks aren’t free so the individual write operations in place will never be faster than append-only.

In CouchDB compaction does not lock the database (nothing can lock the database) so the only way compaction might be “costly” is if you triggered it during heavy write load (the new writes and the compaction task would both be competing for use of the disc). Which is why CouchDB doesn’t do automatic compaction, you are supposed to trigger it when your write load isn’t peaking.

As I said before, at some point in the life of your database a disc write will fail. Not keeping something append-only around is incredibly concerning. Stories like this are dubious not because they expose a few bugs in MongoDB but because they show inherent architectural problems you cannot overcome long term without something append-only. MongoDB encourages you to throw heavy load at it by touting their performance but everyone’s load looks a little different and when MongoDB does fall it falls hard and you’re left with whatever the last backup was assuming that predates any of your corruption.

Consistency Guarantees

For the most part, when you write data you want to be assured that it’s going to stay there and be accessible. I touched on this a little earlier when I talked about the request/response differences between everyone and MongoDB. Understanding the difference between when something is availability and when something is persisted to disc is also important. Traditionally something isn’t to be considered “guaranteed” until the fsync() to disc finishes.

Some databases, like Cassandra, actually take this a little further offering “full consistency” across a cluster of nodes giving assurances that fsync’s across multiple nodes are complete. CouchDB uses “eventual consistency” which is to say that a single CouchDB node has your data on disc and will replicate with other nodes at some point in the future. CouchDB allows you to take nodes offline and bring your entire database with you on your devices including mobile phones so “full consistency” across nodes that might be offline is actually impossible. This is a good example of differences in use cases when you decide what granularity of consistency you need for you application.

Redis is an in-memory database (with a soon to be released version using a hybrid of memory and virtual memory) but they keep around an append-only file so that you can also bring back up a node after it’s crashed and get your data. There are 3 options for how it will fsync(). One is “never”, it just sends writes to the kernel and lets the kernel flush the data to the append only file when it feels like it. Another option is every second and the last option is “always” which does a “group commit” style continuous fsync().

MongoDB writes to a mem-mapped file and lets the kernel fsync it whenever the kernel feels like it. More recent versions added a feature that does an fsync() to disc every minute if the kernel hasn’t done it already. So, at any time you could lose up to one minute of your data if the node goes down. That’s longer than most, if not all, other databases that persist at all to disc (when Redis lets the kernel handle it exclusively it might actually be longer).

Some Conclusions

When you look at MongoDB more critically I don’t see how you could actually justify using it for anything resembling the traditional role of a database. They have a great feature list which makes you think of all these things you can do with it but most if not all of those things will require it to not lose the data you put in it. If you wanted a cache with great indexing, or you needed to store data quickly that was too large to put in to memory but didn’t care about losing it, MongoDB would be a good choice. Of course, this is not what they market it as. 10gen’s sizable marketing effort promotes MongoDB as the new M in your LAMP stack, or Ruby/Python equivalent, without addressing their differences in durability with your existing M and almost any alternative “NoSQL” database.

The clincher for me was when Josh Berkus told me that he had an in-memory version of Postgres and once you turn off the log and all the durability it’s neck and neck with MongoDB write performance and I thought “who would be crazy enough to run that in production” but then I realized it’s about the same reliability you get with MongoDB.

It’s sad but in this gold rush of new databases where you can find a database that fits your exact use case so many people are choosing one that just isn’t really fitting for theirs. If “NoSQL” is going to survive as a movement of replacements for relational databases it’ll have to do it with proper durability and consistency guarantees like their RDBMS counterparts when the use case necessitates it. Most have delivered on these guarantees, some have not.

I work on CouchDB and to people who are writing webapps in Ruby and Python it probably looks like we’re competing with MongoDB but in reality we aren’t. We’re putting CouchDB on your mobile phone so you can take your applications and data with you and work on it offline. We’re trying to extend the web platform to mobile and be the glue HTML5 uses cross-mobile. But, I still find uses for Redis and love it when I need it. I don’t have a warehouse full of data but if I did I’m sure I’d take a serious look at Cassandra. There are great databases out there and people should understand them and use them when you have the use case they are built for.

Share and Enjoy:

Category: Community, CouchDB, Django, Firefox, JavaScript, Python, Web, node.js
72 Comments - Feed

72 Comments on MongoDB Performance & Durability

Respond | Trackback

schmichael says:

July 7, 2010 at 2:05 pm

The key thing to remember about MongoDB and durability is that it is not single server durable. Even saving data with safe=True (”waiting for a response”) doesn’t insure durability as it returns before the data is replicated. The developers are planning on adding the ability to specify a replication factor to wait for on writes, but as far as I know there’s no timeline for this feature.

So until then, MongoDB isn’t as durable as databases with write-ahead logs. This is absolutely something a developer should know before choosing MongoDB. However, MongoDB is exceedingly fast and offers a wonderful developer experience (schema-less collections, dynamic queries, even some simple GEO-indexing and querying).

I’m a big Mongo fan, but you definitely need to be aware of it’s shortcomings before using it.
Almad says:

July 7, 2010 at 2:06 pm

Thanks for the article, nice comparison.

I will give our Mongo a bigger stress ,)
mikeal says:

July 7, 2010 at 2:22 pm

@schmichael

i’ve heard their single node durability argument and it ignores the difficulties of actually keeping a multi-node system available during the kinds of corruption issues you have when you don’t keep anything append-only.

if you’re only syncing your mem-mapped btree to disc then a corruption in a btree node will corrupt that entire section of the btree. so in the case of writing bad data, the disc doing an improper write, or a failed write, you could lose a *lot* of data and keeping that up while you take load on another replica during your restore is going to be challenging.

It’s also entirely possible that corruption can happen in the replica, for most databases this is less of an issue because with append-only you can narrow down the corruption to a particular operation and it’s unlikely that two replicas have the exact same corruption in the exact same operation unless the data was invalid anyway, but with mongo’s model the corruption will be larger and it’s much more likely that sections of the btree effected by corruption will overlap.

-Mikeal
jb says:

July 7, 2010 at 2:23 pm

As schmichael said, 10gen has made it abundantly clear from the beginning that mongodb does NOT have single-server durability. Its entire architecture is designed around replication.

Blocking writes that wait until the data has replicated to a number of data sets that you can define is a feature in 1.5 (the current development branch) and will be in 1.6 (the stable branch due out this month).

Single-server durability is coming in 1.7/1.8.
mikeal says:

July 7, 2010 at 2:27 pm

@jb

you can refer mostly to my comments to @schmichael. it’s great that single-server durability is on their todo list but without architectural changes (which I haven’t seen any sign of) it’s not actually possible.

people have been engineering databases for a long time and append-only (either for the db format or just a transaction log) is what almost all database developers have come to in order to provide durability. until i see something about that I’m not buying a bullet point on their todo list for some future release.

you can track down bugs forever in the current approach but more will come up as soon as new features are implemented or people throw new kinds of load at it.
Alex Popescu says:

July 7, 2010 at 2:45 pm

Hi Mikeal,

This is a behavior that hopefully is better known now (I’ve written about it in February: http://nosql.mypopescu.com/post/392868405/mongodb-durability-a-tradeoff-to-be-aware-of). And while I do agree with you that disk durability is important, I should point out that there are a couple of solutions out there that are betting on replication-durability (i.e VoltDB). Personally I think there are problems out there that can trade durability for speed and the only thing that’s important is to be aware of this trade off.
Sammy says:

July 7, 2010 at 2:47 pm

If you actually look at the mongo Jira tickets they are going to use a transaction log
Also, there is and will be a very strong trend away from single server durability in the cloud when you have transient machines like ec2. You should also look at voltdb which quietly has a similar approach
mikeal says:

July 7, 2010 at 2:50 pm

@alex

I’m sure some people understand these issues but most NoSQL databases try to be very upfront about what their tradeoffs are for their target use case. MongoDB is very aggressively marketed as generic replacement for all SQL use cases with a lot said about performance and almost zero (in the marketing message) about durability.

It’s great to see you, and others, posting clearly about the tradeoffs, i’m just afraid that message isn’t making it through to the average Rails/Django developer.

@sammy can you post a link to that jira ticket?
Josh Berkus says:

July 7, 2010 at 3:04 pm

All of these arguments about “multi-node durability” from the MongoDB camp would have some kind of merit if MongoDB actually implemented it, which it doesn’t. Cassandra, Hypertable, and some forks of Memcached have multi-node durability.
Mathias Stearn says:

July 7, 2010 at 3:23 pm

I can tell you that BTree corruption won’t cause data-loss, since the actual data is stored in something resembling a series of doubly linked lists, rather than directly in the indexes. Our durability is very similar to MySQL’s default MyISAM storage engine: if you shut-down cleanly your data is safe, if not, you can make a best-effort attempt at repair. In a few months (1.8 timeframe) the only issue will be what the default is.

As for the fire-and-forget mode, I think you are only complaining about the driver’s defaults. It’s completely OK to run with sate-mode for every write, but it will be inefficient with many workloads since your thread will be stalled after each write. It is especially bad when doing single-threaded loading since each insert has to wait for a round trip and the user really only cares that the whole thing has finished. With the current defaults it is possible to send many writes, then do a single call to block until they are applied (or even replicated to N nodes in 1.5.x). This means your latency to the user is waiting on 1 round-trip rather than one per write. Also, other drivers will soon offer optional db- or collection-level WriteConcerns similar to the Java driver which lets you say that you want to block until your user data is replicated, but you don’t want to block for writes to the analytics collection. If you’ve ever seen “waiting for google_analytics” while watching a page load, you will know why this is important.
kristina says:

July 7, 2010 at 3:34 pm

Full disclosure: I work for 10gen.

You strategically posted this when my air conditioning was broken, so here are a few thoughts before I go find somewhere cooler. Since CouchDB is “not a competitor” to MongoDB, it’s nice of you to put all this time into a public service.

> MongoDB, by default, doesn’t actually have a response for writes.

Whoopsy, got your emphasis wrong there. We did this to make MongoDB look good in stupid benchmarks (http://www.snailinaturtleneck.com/blog/2009/06/29/couchdb-vs-mongodb-benchmark/).

Seriously, though, this “unchecked” type of write is just supposed to be for stuff like analytics or sensor data, when you’re getting a zillion a second and don’t really care some get lost if the server crashes. You can do an insert that not only waits for a database response, but waits for N slaves (user configurable) to have replicated that insert. Note that this is very similar to Cassandra’s “write to multiple nodes” promise. You can also fsync after every write.

> MongoDB writes to a mem-mapped file and lets the kernel fsync it whenever
> the kernel feels like it.

fsyncs are configurable. You can fsync once a second, never, or after every single insert, remove, and update if you wish.

> When you look at MongoDB more critically I don’t see how you could actually
> justify using it for anything resembling the traditional role of a database.

This is because you assume you’ll run it on single server. MongoDB’s documentation clearly, repeatedly, and earnestly tells people to run MongoDB on multiple servers.

Also, as another commenter mentioned, full single-server durability is scheduled for the fall.

> Stories like this (http://www.korokithakis.net/node/119) are dubious not
> because they expose a few bugs in MongoDB but because they show inherent
> architectural problems you cannot overcome long term without something
> append-only.

Stories “like this” show that MongoDB doesn’t work for everyone, particularly people who give no specifics about their architecture, setup, what happened, or anything else. Isn’t it irritating how people will write, “MongoDB lost my data” or “CouchDB is really slow” and provide no specifics?

That’s not to say that things never go wrong, MongoDB is definitely not perfect and has lots of room for improvement. I hope that users with questions and problems will contact us on the list, our wiki, the bug tracker, or IRC (or, heck, write a snarky blog post). Anything to contact the community and let us try to help. I wish every person who tried MongoDB had a great experience with it.

Lots of users, hopefully most, love MongoDB and are using it happily and successfully in production.
Dude says:

July 7, 2010 at 4:02 pm

Well done mongodb dudes laying down the awesome in this thread. There may be issues/tradeoffs today, but they are on their way out.

As for the ‘average’ python/ruby dev nit knowing the tradeoffs… Well, that’s not mongos problem… What dev only reads marketing copy?!?!
schmichael says:

July 7, 2010 at 5:38 pm

Your post’s title mentions performance, but the post mainly seems concerned with durability. As I’ve been spending a lot of time lately comparing MongoDB, Cassandra, and PostgreSQL, I’d love to hear you talk more about the performance aspect of the databases you mention.
mikeal says:

July 7, 2010 at 5:54 pm

@schmichael

the way MongoDB achieves greater performance is by sacrificing durability. when most databases talk about performance, particularly when they have a variety of non-default configurations you can run in that enable different performance characteristics, they talk about what durability you sacrifice in return.

in the future i will be posting some side by side numbers with other databases but mongodb will need to be in a non-default configuration to do so.

@mathias

my concern with the defaults on the driver is that it’s default configuration shaves ~10ms when talking to another machine on the same network which is unnoticeable to users and developers unless they are under load but when under load it’s actually possible that clients might be requesting data they wrote that the http server said it wrote and it won’t be accessible yet.

i’m interested in the group response you’re talking about. does mongodb not have a bulk write api that would essentially do the same thing?

@dude (really…. “dude”?)

we’ll see how those tradeoffs work out. once durability is increased we’ll see how the performance characteristics change. most people do this stuff the opposite direction, make sure you don’t lose people’s data and then work on performance, but we’ll see.
MongoDB Performance & Durability | Traceback (most recent call last): : Popular Links : eConsultant says:

July 7, 2010 at 6:09 pm

[...] the original: MongoDB Performance & Durability | Traceback (most recent call last): 7 July 2010 | Uncategorized | Trackback | del.icio.us | Stumble it! | View Count : 0 Next Post [...]
mikeal says:

July 7, 2010 at 6:11 pm

@kristina

for some reason wordpress wanted me to moderate your post so sorry for the delay in it showing up.

>> Whoopsy, got your emphasis wrong there ….. Seriously, though, this “unchecked” type
>> of write is just supposed to be for stuff like analytics or sensor data, when you’re getting
>> a zillion a second and don’t really care some get lost if the server crashes.

did the default change? the last time i attempted to a concurrent performance test this was one of the barriers i hit. my issue isn’t that you include this feature, it’s that it’s the default, i certainly believe there is a use case for it i just think it’s harmful as a default.

>> Since CouchDB is “not a competitor” to MongoDB, it’s nice of you to put all this time
>> into a public service.

haha, that’s funny. i regularly use non-CouchDB databases and I get along great with all the people from other databases at conferences. even if i did feel like we were competing, i wouldn’t care. this post really is about reliability issues i don’t think your users are fully aware of and i honestly hope that you fix.

>> fsyncs are configurable. You can fsync once a second, never, or after every single insert,
>> remove, and update if you wish.

that’s really good to hear. have you optimized for a “group commit” yet?

>> This is because you assume you’ll run it on single server. MongoDB’s documentation
>> clearly, repeatedly, and earnestly tells people to run MongoDB on multiple servers.

I responded earlier to the complexity of actually keeping something available that depends on this. so i won’t cover it again.

>> That’s not to say that things never go wrong, MongoDB is definitely not perfect and has
>> lots of room for improvement. I hope that users with questions and problems will
>> contact us on the list, our wiki, the bug tracker, or IRC (or, heck, write a snarky blog
>> post). Anything to contact the community and let us try to help. I wish every person
>> who tried MongoDB had a great experience with it.

You make it sounds like this is all just a matter of bugs, it’s not, and i find blaming it on users who don’t use JIRA or get on IRC a little distasteful.

these issues are architectural and until you do *something* append-only they aren’t going to go away. someone mentioned earlier that you plan to do an append-only transaction log, if that’s accurate then it’s fantastic news.
Jason says:

July 7, 2010 at 6:51 pm

Schmichael makes a good point. What I got from Mikeal is that it is irresponsible to go full-tilt toward performance without providing reasonable durability guarantees.
Mathias Stearn says:

July 7, 2010 at 8:25 pm

@mikeal

I think a lot of this just comes down to differences of opinion, and that is not going to be resolved in the comments section of this blog. I think that single-server durability is a nice-to-have rather than a need for many (not all) use-cases. I’d trust a offsite backup (easy in mongo) and an offsite replica (also easy) far more than I’d trust a mechanical disk, or even an array of them. MySQL made the same claim for many years until InnoDB was added and still uses a non-durable engine by default. Given that MySQL is probably the most used DB, and many people still use MyISAM tables, I think its hard to make a claim that such a DB is unusable, but that seems to be the claim you are making.

Some people *do* need very strong durability guarantees, and for them I’d suggest using something other than MongoDB. As Dwight likes to say, “the days of one-size-fits-all storage are over.” I do make a point of mentioning our take on durability in my presentations, and it has been mentioned many times by us and others (to the point that someone on hacker news is tired of hearing about it), so I don’t think its true that most users don’t know about it.

re group commit: That is already how getLastError (the server call to implement safe-mode) works. It blocks the connection until all previous operations have been commited. By default “commited” just means applied to the mmaped structure, but it can also mean replicated or flushed to disk. In many ways this is more efficient than a traditional multi-write api since the server can start working when it gets the first request rather than waiting for the client to provide the full batch.

re append-only: There are other ways to achieve durability (http://en.wikipedia.org/wiki/Soft_updates is one). We will probably just do a traditional journal, but its not true that it’s the “only way”.
pmonks says:

July 7, 2010 at 8:34 pm

@mikeal concluded “When you look at MongoDB more critically I don’t see how you could actually justify using it for anything resembling the traditional role of a database.”

I couldn’t disagree with this more. There are several use cases where performance can trump consistency and/or durability, including:

where not every single drop of the data being stored is “precious” (@kristina’s analytics case being one example of this)
where the database in question is not the system of record for the data it’s storing (for example in the case of a runtime repository in a Content Production style Web CMS)

For these use cases developers tend to default to the old RDBMS workhorse (as you point out yourself), but there are significant benefits in cutting back on the ACID (C and D specifically) and obtaining benefits in terms of performance (both response times and throughput).

Now CouchDB doesn’t allow one to make these tradeoffs (being myopically fixated on ACID at the cost of everything else, including such fundamental database features as dynamic queries) and so isn’t well suited to these use cases. Does that mean CouchDB is a complete crock of sh1t and shouldn’t be used by anyone under any circumstances? Of course not – there are use cases that it’s extremely well suited to. In a similar vein, there are use cases that MongoDB is uniquely well suited for, and provided one is aware of the design trade-offs inherent in CouchDB and MongoDB (not to mention the various other “nosql” products) these things shouldn’t be a surprise.
mikeal says:

July 7, 2010 at 8:42 pm

>> MySQL made the same claim for many years until InnoDB was added and still uses a
>> non-durable engine by default. Given that MySQL is probably the most used DB, and
>> many people still use MyISAM tables, I think its hard to make a claim that such a DB is
>> unusable, but that seems to be the claim you are making.

this is actually quite interesting. when Postgres had durability and MySQL didn’t the marketing behind MySQL was that you don’t need single server durability and were better off with what they provided. Once MySQL finished adding better durability their marketing messaged significantly.

while i have mentioned that i don’t feel the MongoDB marketing message is accurate i have been careful not to say that there isn’t *any* use case for it, i just don’t believe the use case it is being marketed for is the one it is actually suited for at this time when compared to other databases in regards to durability.

>> As Dwight likes to say, “the days of one-size-fits-all storage are over.”

I completely agree 100%

the days of mysql-fits-all and postgres-fits-all are over. but you have to admit that the message from 10gen is that mongo is the replacement for mysql/postgres in your stack and by inference is taking on a one-size-fits-all message.

>> re group commit: That is already how getLastError (the server call to implement safe-
>> mode) works. It blocks the connection until all previous operations have been
>> commited. By default “commited” just means applied to the mmaped structure, but it
>> can also mean replicated or flushed to disk. In many ways this is more efficient than a
>> traditional multi-write api since the server can start working when it gets the first
>> request rather than waiting for the client to provide the full batch.

that actually isn’t what i referred to as a “group commit” i think you’re referring to a traditional “bulk insert”. a “group commit’ is where you fsync() all the pending writes and once that fsync() has returned you fsync() all the new pending writes that have piled up while that fsync() was going. this way you continuously flush to disc a fairly efficient manor using larger writes.

>> re append-only: There are other ways to achieve durability
>> (http://en.wikipedia.org/wiki/Soft_updates is one). We will probably just do a traditional
>> journal, but its not true that it’s the “only way”.

i believe i prefaced that paragraph using the word “sane”.

while soft updates are safe their inventor and implementor in UFS have admitted are very hard to get right.
J Chris A says:

July 7, 2010 at 8:46 pm

@pmonks

I think the use cases you mention (analytics, runtime cache) are not the traditional role of the database.

The traditional role of the database is as a place for “you to put things permanently that won’t corrupt or degrade over time.”

The traditional databases got so good at this, that people started using them for other use-cases, out of familiarity. Some of these non-traditional roles are a sweet spot for NoSQL stores like Memcached and Redis.

What Mikeal takes issue with in this post is the perception that MongoDB is a drop-in replacement for the storage system-of-record part of your application. It can’t fill that role unless it’s run with enough redundancy to make up for the fact that machines fail.

It’s also pretty unique in the fact that after an uncontrolled shutdown there’s a decent chance you’ll be hosed. Even with MyISAM there’s a well documented (if painful) procedure for recovering from power failures or other uncontrolled shutdowns.
pmonks says:

July 7, 2010 at 9:09 pm

@J Chris A:

That sounds like a rather selective definition of “database” to me. According to dictionary.com (representative of several sources I checked):

“da·ta·base [dey-tuh-beys]
–noun
1.
a comprehensive collection of related data organized for convenient access, generally in a computer.”
Nuno says:

July 7, 2010 at 10:44 pm

@pmonks: The dictionary is unlikely to be the best source for the definition of a database.
Plus: Take it from another geek. That’s not going to turn true just because you and @kristina believe in it.

It’s fairly obvious to observe that NO-ONE in a traditional RDBMS problem should trust MongoDB with their information. I do understand you guys have great usability and speed but it’s really a niche product, a tool that is very specific to a small set of problems where durability is no concern. And Mikeal’s point that “most rails developers don’t realize (or even understand) how important this can be” is completely valid.

What’s your justification not to do MVCC?
(Please don’t say it’s slower – you know people in this thread know better than that)

And please stop with rants about performing faster than CouchDB or definitions from dictionary.com. It just makes you look bad to everyone else while not adding anything to the discussion.
Michael Stillwell says:

July 7, 2010 at 11:12 pm

@Mathias Stearn You said: “By default “commited” just means applied to the mmaped structure, but it can also mean replicated or flushed to disk.” Is this documented somewhere? I had been wondering about what the safe commit actually did but haven’t been able to find it in the documentation.
El Duderino says:

July 7, 2010 at 11:14 pm

@mikeal

You mad, bro?
mikeal says:

July 7, 2010 at 11:40 pm

@ el duderino

awe shit, i didn’t know this was “the dude” i just thought “a dude”. didn’t know i was talking to his dudeness.

i take it all back
RethinkDB - The database for solid state drives. says:

July 8, 2010 at 12:01 am

[...] Rogers wrote a blog post on MongoDB performance and durability. In one of the sections, he writes about the request/response [...]
David Zuelke says:

July 8, 2010 at 1:27 am

@Jason: you said “What I got from Mikeal is that it is irresponsible to go full-tilt toward performance without providing reasonable durability guarantees.”

That’s not the problem. The problem is going full-tilt toward performance without providing reasonable durability guarantees while marketing the product as the incarnation of salvation (here: the new M in LAMP) and not being upfront about the tradeoffs (here: the critical lack of acceptable levels of durability).

This is why I like CouchDB. They communicate use cases and tradeoffs openly. They also don’t have a querying API that’s completely polluted with retarded magic, but that’s not what we’re discussing here, so I won’t start a rant :p
kristina says:

July 8, 2010 at 3:26 am

> haha, that’s funny. i regularly use non-CouchDB databases and I get along great with
> all the people from other databases at conferences.

Oh, I tend to bite people when I find out they use another database . Maybe I should stop that?

> even if i did feel like we were competing, i wouldn’t care. this post really is about
> reliability issues i don’t think your users are fully aware of and i honestly hope that
> you fix.

You must be thrilled to learn that single server durability is coming. I look forward to a followup post extolling MongoDB’s virtues this fall.

> I responded earlier to the complexity of actually keeping something available that
> depends on [multiple servers]. so i won’t cover it again.

Yes, it is a difficult, but not unsolvable, problem. Mongo’s made a bunch of tradeoffs in the awesome vs. easy to program area. For instance, remember last year when CouchDB was saying MongoDB sucked because of its lack of concurrency? That it was too complicated to do concurrency in C++ and that Erlang was the way? Well, now Mongo has concurrency, so on to the next “must have” thing.

>You make it sounds like this is all just a matter of bugs, it’s not, and i find blaming
> it on users who don’t use JIRA or get on IRC a little distasteful.

People discuss everything from bugs to architecture to lunch on our various forums. I was trying to say, possibly badly, that we have a lot of ways for people with questions, problems, and suggestions to reach out.

Eliminating the methods I outlined, I’m not sure how people with suggestions could reach the developers, other than telepathy.

Also, the user you cite is far from typical. It sucks that some people don’t like Mongo, but there’s are a lot more out there from those who do: http://codeascraft.etsy.com/2010/07/03/mongodb-at-etsy-part-2/, http://blog.eventbrite.com/guest-post-why-you-should-track-page-views-with-mongodb, http://blog.wordnik.com/what-has-technology-done-for-words-lately, http://www.engineyard.com/blog/2009/mongodb-a-light-in-the-darkness-key-value-stores-part-5/ and so on.
Dude says:

July 8, 2010 at 4:32 am

El Duderino and myself are not the same person.
Otávio Sampaio says:

July 8, 2010 at 4:50 am

mikeal thanks,

that was a truly awesome explanation.

regards,

otávio
Suissa says:

July 8, 2010 at 5:14 am

Sweeeeeeeeeeeeeeeeeeeet post! COngratz!
Can I translate to portuguese and put in my blog? And of course I’ll cite your site.
Mike Dirolf says:

July 8, 2010 at 6:02 am

I think that Kristina and Mathias have already done a great job of explaining / defending the decisions that have been made in the development of MongoDB, so anybody reading this comment should probably just go read theirs – I’m sure they’re better.

You say “MongoDB, by default, doesn’t actually have a response for writes. … I don’t know of another database that even allows you to return a response before the write data is accessible”. I wholeheartedly agree . One of the great things about the way MongoDB handles requests is that you have the *option* of not waiting for a response. You also have the *option* of waiting for the write to be committed in the mmap structure, or waiting for the write to be fsynced, or even waiting for the write to be replicated. So all of the behaviors you discuss for other systems’ request/response handling are possible. I’m not sure why giving people the option is such a terrible thing, I guess.

Secondly, I just want to note that you guys are throwing around the word “marketing” a lot, which sort of has this connotation that we’re this evil group of masterminds sitting in a room trying to come up with ways to trick people. That couldn’t be further from the truth. We don’t hire anybody to do “marketing”; we just all make a lot of effort to talk to people, have events, and generally foster a good community. In fact I don’t think I’ve ever given a talk about MongoDB where a question about Couch hasn’t been asked, and I’ve always done the best job I possibly can to give a fair and unbiased report of the differences between the two. Actually I think a lot of you guys have probably seen me talk and answer those questions, and have never approached me with any issues or concerns about how I positioned the two systems. The point is, we try really hard to be honest and straightforward. Every issue you mention here has been blogged about, mentioned in talks, etc. etc. etc. It’s great that you’re interested in talking about these issues, but I just wish the tone was a bit less negative towards all of the folks working on MongoDB.
mikeal says:

July 8, 2010 at 8:38 am

@mike

i don’t disagree that there is a use case for not waiting on the response for a write and, as you point out, MongoDB being the only database to do this probably makes it uniquely capable for those use cases. MongoDB might just be the best DB ever built for analytics.

what i’ve taken issue with is that it’s the *default* option and, for reasons I’ve already mentioned, when attempting to replace the “M in your LAMP stack” with MongoDB it’s going to cause you a lot of pain under load for all the more common use cases.

i’ve met you a few times and have seen your talks and I know that you are a great guy and when you talk about other databases with people you are honest and level about the tradeoffs. i’ve been told some of your colleagues aren’t quite so balanced and after seeing their comments here, on HN, and on Twitter I can’t say i’m all that surprised. but, for what it’s worth, if i had a conference you would be on my list of people to invite (hey, wait a second, want to come to CouchCamp! ).

everyone who is building a database right now will live or die based on developer adoption. developer evangelism between these technologies right now is basically marketing and we all have a message. the CouchDB message is about couchapps and mobile and the MongoDB message is to replace your existing 3rd tier in your Rails/Django stack with MongoDB. i know that you’ve mentioned durability publicly but comments from users here, on HN and on Twitter make me believe i was right in my belief that your users weren’t fully aware of the technical issues you currently have with durability.

i really honestly hope that you change somes of these defaults and keep an append-only transaction log. Rails/Python needs a new 3 tier because MySQL is absolute pain in the ass to setup and maintain. but these people need better durability by default that doesn’t require setting up 3 servers.
mikeal says:

July 8, 2010 at 8:53 am

@suissa

great response. this article is CC licensed so you’re free to translates it, and i also think it would be great http://creativecommons.org/licenses/by-nc/3.0/us/

@kristina

concurrency and durability aren’t check boxes on feature list they are constant goals that, when you prioritize them, change your decision making and architectural requirements.

i’m sure that MongoDB doesn’t fall over under concurrent load but saying that it “does concurrency” is kind of like saying it “does internet”.

maybe mongo can handle more concurrent connections in newer releases than previous ones, that’s great, but there is still a global lock on the node during writes which means that concurrent writes to multiple databases won’t scale linearly when increasing the number of clients if they talk to multiple dbs. while i’m sure you handle the connections fine the fact that the write speed won’t degrade linearly coupled with the default client option to not wait for a response on a write means that the session storage behavior i bring up in my article will be exposed much earlier under concurrent write load to multiple databases.

i’m sure you’ll get the lock down to the db level in a future release. but, many other databases chose to prioritize concurrency and/or durability in early releases before performance optimizations and any performance optimizations that would degrade concurrency and/or durability is considered a regression.

i guess all I’m trying to say here is that it’s a great goal to replace RDBMS in Rails/Django/PHP but the prioritization of the project and defaults need to change for MongoDB to be good replacement.
Spacemonkey says:

July 8, 2010 at 9:21 am

I’m going to talk about two other FOSS projects, Drupal and Joomla (of which I’m a founder). Way back in the early Mambo years, we were taking all the awards and getting all the buzz, and I was pointed at a somewhat unhappy and not-so-friendly post at the Drupal forums about how those Mambo people stole all the limelight and how Drupal could get some love…

Andrew Eddie (another Joomla founder) and I both created accounts and posted very supportive information in that thread, basically pointing out that they needed a dedicated team for advocacy and evangelism (not only to developers but end users and designers as well), and that raising money for publicists was well worth it.

Not once did we accuse Drupal of being liars or saying Drupal sucked. We never said “you’ll never be as popular because you’re code is no good.” For me such a thing is unthinkable, as we’re all colleagues and should be able to communicate with each other as grownups.

This was maybe six years ago, trying to find a cache of it but no luck, was really hoping to provide a URL. Sorry about that!

While speaking at CMSExpo in Chicago just this year, I had Dries Buytaert in the audience of one of my sessions, and I dedicated several slides in my deck to some of the awesome things they have been doing – and involved Dries in the session so he could answer specific questions from anyone interested, in essence “handing him the mic.”

So why all the Mongo hate? Do you NOT realize that all the buzz MongoDB generates benefits everyone in the NoSQL space? There’s not a single entry on the MongoDB.org website that states something silly like “hey drop all your data on one server with mongodb, you’ll never lose a single byte!” Also, I’ve always heard the message that MongoDB intended to be a best-features replacement of the MySQL+Memcache combo, which is absolutely NOT the same message that you keep accusing the 10gen folks of spreading.

Let’s leave the astroturfing and FUD to companies like Oracle and Microsoft, shall we? This kind of behavior is beneath FOSS (at least in my book).
Glenn Gillen says:

July 8, 2010 at 9:32 am

As a ruby developer and a mongodb user I can say that I barely looked at the marketing materials, and got straight down to examples and then the documentation. Even from my cursory glances I knew that I’d need to run a multi-server setup (which I have no problem with), and that if I wanted to ensure write consistency I’d need to specify that… along with fsync to be doubly sure. I think Mathias and Kristina have pointed out all of the inaccuracies so I need not harp on any more.

But it’s the use of phrases like “to people who don’t live in databases all day…” combined with under researched opinions that doesn’t sit well with me. While you guard this by saying it’s not the opinion of your employer, it’s hard to not take it as a mud-slinging FUD match.
links for 2010-07-08 – Magpiebrain says:

July 8, 2010 at 1:01 pm

[...] MongoDB Performance & Durability A discussion of the tradeoff between durability & performance in CouchDB and MongoDB (tags: nosql couchdb mongodb redis comparison durability performance) [...]
Christian Romney says:

July 8, 2010 at 1:06 pm

I can see both sides of the defaults argument. On the one hand, it’s just irresponsible to toss a database into a production environment without bothering to configure it properly. On the other hand “sensible default configurations” save even competent and knowledgable admins time and effort. Of course, what counts as sensible will vary but the 80/20 rule probably agrees with mikeal’s point. I wouldn’t call this a huge issue, it’s more of an inconvenience.

I also get that it sounds like sour grapes coming from someone who works on Couch, but that doesn’t necessarily mean he’s wrong. My curiosity is piqued enough where I’ll research this for myself, but I sure would have preferred to have read a refutation of mikeal’s append log assertions over “dude, don’t hate.”
Nuno says:

July 8, 2010 at 3:29 pm

>> My curiosity is piqued enough where I’ll research this for myself,
>> but I sure would have preferred to have read a refutation of mikeal’s
>> append log assertions over “dude, don’t hate.”

Anyone? Is ignoring really the answer?

I can only suppose the fact no-one answers means you agree with Mikeal’s point that Mongo has a crippled architecture that will never do things right?

If not can you pleeease explain. Everyone wants to know.
J Chris A says:

July 8, 2010 at 3:48 pm

@Nuno,

I believe the plan is to add an append-only log to MongoDB sometime in the fall. That should help tremendously, at least with the durability concerns.

Adjusting the default client behavior to block until data is available is it’s own issue, I don’t recall off the top of my head what the plans are for this.
Mathias Stearn says:

July 8, 2010 at 3:49 pm

Nuno: I’m not sure what there is to refute…append-only logs are a good way to ensure on-disk durability, no one disagrees with that. And our architechture (I’m talking about the source code here) is flexible enough that we will be able to add this in our next release cycle, which is only 3 months long.

So no, I don’t agree with the assertion that “Mongo has a crippled architecture that will never do things right”
Managed Forex Accounts – How to Receive Constant Long-term Profits | Learn to Trade Emini Contracts says:

July 8, 2010 at 7:29 pm

[...] MongoDB Performance & Durability [...]
El Duderino says:

July 8, 2010 at 7:39 pm

It’s pretty obnoxious how much noise this post is making. In simpler times we’d see it for what it is – negative advertising – and ignore it outright. I think the 10gen participants have represented themselves well here without resorting to their own nasty trash-couch style posts. And it must be tempting. mikael comes off like a toolbox full of sour grapes.

Mikeal, you’ll be really super duper happy when Mongo has single disk durability (as it will this year)? I thought you said disks could fail? So why is single server durability anything more than a checkbox you guys throw out as the end all? And what will you gripe about when Mongo has it?

All I know is if I were running the CouchDB show, Id be bummed my employees were stretching the truth to slag off the competition. Spacemonkey said it better than me – can’t we leave that crap to Oracle and Micro$oft?

Dude and I are not the same person.
Interior Design and Home Appliances » Blog Archive » July 7 – July 13, 2010 > Call 403-823-2580 to place an ad says:

July 8, 2010 at 7:45 pm

[...] MongoDB Performance & Durability [...]
El Duderino says:

July 8, 2010 at 8:02 pm

Oh, and full disclosure – my employer pays Oracle a ton of money for durability of all sorts. Very few of us even know the term NoSql much less can distinguish between (or even name) players in that space. I have no dog in this fight and my only interest is in quality solutions that help me solve problems.

@mikeal – remember the pond you currently swim in – there’s room for you both.

Ok, I’m gonna go download VoltDB.
Riyad Kalla says:

July 9, 2010 at 6:47 am

Realizing that this turned into a bit of a heated discussion I just wanted to add my two cents to the original poster and the Mongo team for following up: I got *a lot* out of this article.

Going in I knew next to nothing about append/transaction logs, fsyncs or any of the things used to make db data durable… it was just a place I stuck stuff. I was aware of replication and that was about it.

Reading the article then all the comments put me in a much better place. I know when using Mongo to look for those settings in the docs now and to look forward to the transaction log (which is something I never would have cared about) since Mikeal gave solid reasons for needing one.

Regardless of where your loyalties fall, this article was a great read and very educational. Thanks Mikeal + Mongo dudes + community.
Colin says:

July 9, 2010 at 7:56 am

I have yet to see a reasonable feature request (wrt this post, call it single-server durability) which fits with their goals made to the 10gen folks that doesn’t end up in JIRA and on the roadmap. I wish all software projects progressed at the pace of Mongo, it’s like Christmas, only quarterly.
cremes says:

July 9, 2010 at 8:06 am

Here are a few ways (using the ruby driver) to insert a document.

doc = {’a’ =>1, ‘b’ =>2, ‘c’ => [4,5,6]}

*default*
collection.insert doc

*safe-mode, blocks until the server responds with ok or assertion*
collection.insert doc, :safe => true

*block until fsync*
collection.insert(doc, {:safe => {:fsync => true}})

*block until written to ‘N’ replicas*
collection.insert(doc, {:safe => {:w => 3}})

*block until written to ‘N’ replicas and fsync’ed*
collection.insert(doc, {:safe => {:w => 3, :fsync => true}})

Someone should benchmark that using the C driver and post the results. I imagine that each example (above) gets slower as more and more “safety” is requested.

For a vast majority of my data, I use the default fire-and-forget. For critical data, I use the last example.
cremes says:

July 9, 2010 at 8:10 am

BTW, using the 5 insertion examples I gave above, I fail to see how that lacks durability. If I can block until my data is written to N replicas and fsynced, that pretty much blows away any append-only data store on a single server for durability.

With the current feature set, I don’t care if single-server durability ever gets *any* love.
mikeal says:

July 9, 2010 at 9:13 am

A few generic followups.

Durability isn’t a “feature” it’s something that you prioritize continuously release over release.

Many mentions of an append-only transaction log landing in a future release. Great news!

I really do want MongoDB to start prioritizing durability like they do perceived performance. MongoDB doing better is better for everyone.

In order to get better MongoDB needs to have a less antagonistic relationship with durability. Falling back to mutli-server durability in so many cases is a diservice to users actually trying to keep systems up and accessible with this configuration as it’s not simple to recover from these states and stay accessible.

Better all around durability would greatly improve the experience of people running MongoDB in production. Until they ship with at least some of the defaults changed (return request when data is accessible) and an append-only transaction log I don’t think it’s fair to call it a drop-in replacement for MySQL/Postgres.

The default “fire-and-forget” has a use case (analytics) but it’s not for all “non-critical” data use cases. *any* data that you need to be accessible to the client should not use this method. If you continue to use “fire-and-forget” for that data then you will hit the problems I described under load, that’s why it’s such a dangerous default because it doesn’t cover the most common use cases under load.
MongoDB (Single-Server) Data Durability Guide | The Buzz Media says:

July 9, 2010 at 10:33 am

[...] user or just interested in NoSQL databases in general, you may have seen the excellent “MongoDB has poor data durability by default!” (I am paraphrasing) conversation started by Mikeal [...]
Rodrigo Dellacqua says:

July 9, 2010 at 10:44 am

Mikeal,

I think you are too deeply buried on your own ideas of the world(of databases). What the MongoDB team is doing is just GETTING OUTSIDE THE BOX, one thing I think you should experiment.

If durability is such a concern for you, and you like it been a default, that’s Great! Thats why you work on some project that goals towards MAX D. While others doesn’t, does this means its wrong? Just because they provide you the tools as optional to be durable, they’re doing it wrong?

“Better for everyone” You mean better for you, because I don’t see it like that, Your problem is that you think its revolving about Durability, just because some huge company said that a gazilion years ago when those first RDBMS where created and matured.

If something new doesn’t challenge the way you think about that, its just not worth wasting time on it.

If I were you, I would go out and seek different things so you can open up your mind.

No sane developer or decision maker adopts anything without considering the risks, thats my job, management. Also single server durability WAS something that was really a concern where you didn’t have virtualization of resource at ridiculous prices. Where I had to buy a 50k server to stuff my things into.

I won’t go on the fact that after 50 comments on your post, every single time you mention marketing and “Replacement for all MySQL/Postgres”, I think you got some problems with that, plus that isn’t true.

Let me ask you this? Why would I switch from a ACID Really mature Oracle Server to CouchDB? Both goals towards durability one is free, one is backed up by a huge expert in databases and expensive, if I had my business depending on it, do you think someone would choose CouchDB?

What can I expect from such a new baby boy? Such things are called calculating risks. Or you could go outside the box and compare that with Durability. Couch isn’t so durable as compared to other really mature solutions out there.

But why do still ppl choose CouchDB to power their businesses? Or MongoDb? They are aware of the risks. If some random student (which I would say that 70% of those students in university in Computer Science, go do it, because they like to stay hours playing games, or into some random social network) says he cannot use MongoDB, plus he didn’t go through docs (Very common for the young), he deserves help, not attention. He had a problem, not MongoDb.

I think the message Mongo says is, I’m fast, but as anything in computer science, I got my cons. Every damn pattern has its cons. Durability cons are related to performance. So what, we deal with cons since our first program and thats how we should treat everything in computer science.

Regards.
Riyad Kalla says:

July 9, 2010 at 11:06 am

As I mentioned above I found this topic fascinating. Primarily because I’m a relatively new MongoDB user and every point that Mikeal brought up I wasn’t even aware of (fsync, append logs, etc.)

In my frustration to try and learn what options I *did* have with MongoDB right now, I spent the morning digging through the docs and compiled this guide for folks looking at all the different ways you can configure Mongo or use your driver to ensure single-server durability:
http://www.thebuzzmedia.com/mongodb-single-server-data-durability-guide/

As Mike has already mentioned, we have a transaction or append log coming in 1.7/1.8 timeframe, but 10gen still has to spec that work out so nothing is firm.

I think this conversation likely went a long way to push that priority up the list and I certainly appreciate that.
Rodrigo Dellacqua says:

July 9, 2010 at 2:42 pm

Beat this.

http://blog.wordnik.com/

9 Billion Records on a Dictionary Webapp. Do you think that they did be happy if they lost any of its records? I guess not. Why are they happy? Coz they are using it the proper way, not just installing and forgetting.

Regards.
Mardix says:

July 9, 2010 at 4:32 pm

Just my two cents.

Right now I am riding with mongodb. And I’m really happy about what I get out of it.

One thing I can see though, right now MongoDB is becoming the leader in the NoSQL movement and CouchDB (read Mikael) is not happy about the adoption rate of MongoDB.

I know something got to be fixed, but in general, MongoDB will be fine with most people.

What does make MongoDB successful right now?

1. Documentation: It’s pretty easy and right available on the site.
2. Drivers: You will almost find any kind of drivers for MongoDB
3. API: It feels so natural. And most of the queries is already done for you, just fill in the blank
4. Easy to install and run. What more can I say? It’s just a drop in.
5. BSON.

Technically, a lot of people who have been dealing with MySQL will tend to try and adopt Mongo faster.

This is a great post, however you can see the bitterness of Mikael in it.
mikeal says:

July 9, 2010 at 7:39 pm

@madrix

somewhere you’re missing the fact that I want MongoDB to improve.

the points you bring up in relation to it’s success are great ones and everyone should take note (except maybe BSON but that’s a matter of opinion).

i’m not bitter, i just have the view of durability that the rest of the db world tends to have and when you’re moving from another db *to* mongo you should be aware of the tradeoffs. but at the end of the day i don’t think those tradeoffs should be made and mongodb should just be better at durability, which sounds like the direction they are taking.

this post isn’t about CouchDB. if you have mongodb setup and you find that the durability is unacceptable CouchDB still may not be the best solution for you. just be aware of *all* the databases out there and chose one that is well tailored to your use case.
Marc says:

July 10, 2010 at 9:19 am

@kristina

I have to say, as someone who has been pondering using MongoDB for several months now, I was taken aback by the lack of professionalism in your posts. It’s one thing when a unpaid developer on a free, open-source project makes snarkish remarks, but another when it comes from a paid developer & representative of a for-profit corporation. It didn’t seem that the issues Mikeal raised or how he raised them were unreasonable or done in a rude way, and I would have expected the responses, especially from employees of 10gen, to maintain the same kind of civility. You have some legitimate differences of opinion and there’s no reason for that debate to devolve into things like:

“Oh, I tend to bite people when I find out they use another database . Maybe I should stop that?”

It makes me wonder if this is the kind of attitude I can expect when I raise questions or concerns with 10gen should I decide to start using MongoDB.

Full disclosure: I don’t use MongoDB, CouchDB or any other NoSQL DB yet. I have several issues with CouchDB, some specific to my use case and some just conflicts with my personal preferences. While I find MongoDB very enticing, I’ve yet to commit to using it because I’m not yet comfortable with the answer to the question: “how do I recover when the server/disk/database fails?”
Nuno says:

July 10, 2010 at 9:28 am

Thank you Matthias. Yes thats obvious to me. What wasnt obvious was what mongo thought about it. Now it is.
Rodrigo Dellacqua says:

July 10, 2010 at 10:34 am

@Marc

Multiple Server durability is what you looking for. Thats how you recover. One of your servers failed permanently? No worries, setup a new one, point it at the cluster, start replication. Done.
Nuno says:

July 10, 2010 at 6:18 pm

@Rodrigo: http://www.joewrite.com/wp-content/uploads/2009/03/haters-gonna-hate.gif .

I subscribe what Marc said fully. People from mongo gave me (in this post) the impression that 10gen is filled with trolls. I’m sorry for everyone else that was nice (Matthias and Mike) but it’s how I feel after this.. At least your developers are passionate (that’s good) but this post makes enough sense to get them angry? So angry they give a awful impression of their employer and make people never want to approach anyone from mongo for a nice, friendly conversation?

My first bad impression about Mongo was when I saw a presentation saying “XML sucks” which really means that developer is a either ignorant or “playing dumb”. Different tools for different tasks, JSON for it’s use case XML for it’s use case. Second was BSON which looks like Oracle binary XML which just means you dont really understand the format. Third when I found out about the things that are being discussed here. A database has to safekeep it’s information and can’t become corrupt and have no way to restore. Sorry, it can. But in very few use cases. Definitely not the new M in LAMP kind of thing… Journal and MVCC are the only two ways I know that enable you to restore data when things go wrong. And as Mikeal pointed out, they will at some point. I might be totally wrong (or totally right) but this shows me that I’m right in one thing:

The dramatic difference between Couch and Mongo is that people from Couch are friendly, want to learn and make their product better. They care about databases and are excited to be changing it, even if a little bit. Mongo: Well – we have this haven’t we?
pablo says:

July 10, 2010 at 10:53 pm

I’m a mongodb user.
@Nuno, kristina is very smart and an excelent speaker.
Common, can’t you understand a joke?

I’ve started using CouchDB but switched to MongoDB because I couldn’t understand what CouchDB is. Is it a db, a web framework, a chat server, a p2p web ???
Look how jchris tried to trick me for using CouchDB as a chat server:
http://lists.therestfulway.com/pipermail/webmachine_lists.therestfulway.com/2009-July/000006.html
Can’t the CouchDB team just focus on the db instead of doing all these “cool” projects all the time? CouchDB doesn’t have official drivers for popular languages so it’s unusable for a single developer. The community drivers are nice but not professional and not consistent. Oh, I forgot, it’s http, it’s cool, so we don’t need drivers…right…

I understand what MongoDB is for. It works great. The team is super responsive in the chat and mailing list. The drivers are excellent.
J Chris A says:

July 10, 2010 at 11:21 pm

@Pablo

Almost got ya!

Seriously, CouchDB is just a state-machine with a realtime HTTP interface. Do what you like with it. It’s actually better for something like chat than anything else I’ve used. No tricks.

The key is that you can use CouchDB + a browser to do the entirety of your CRUD application. Less code, less complexity, more reliability. And then we make it super simple to add async processes like image resize or email sending, without getting in the critical path.

I know some developers get off on complexity. CouchDB isn’t part of that scene. The point is to remove stuff from your stack that you don’t need. Scalable simplicity is more important than performance.

Maybe we should make a page that links to “drivers” for the CouchDB HTTP interface. I haven’t had an issue using it with any language, but it never hurts to make the good choices more obvious. Thanks for the suggestion.
pablo says:

July 11, 2010 at 1:16 am

@J Chris
Exactly my point. MongoDB follow the philosophy of “do one thing and do it well” while CouchDB can do “anything”.

|Maybe we should make a page that links to…
Maybe you’ll wait with that several years. Don’t rush. It’s probably complicated to add several links.

@mikeal
I suggest you focus on improving your software instead of trashing a great product. It just show how insecure you are.

If someone is having trouble choosing between CouchDB and MongoDB I suggest you take a look at JChris’s talk at google which is boring and inconsistent compared to Kristina’s talk which is excellent:
JChris:
http://www.youtube.com/watch?v=ESDBM9-U804
Kristina:
http://www.youtube.com/watch?v=dOP3w-9Q6lU
Stig says:

July 11, 2010 at 6:03 am

MySQL supports setting the default-storage-engine.
The default default is currently MyISAM.
Oracle has announced that InnoDB will become MySQL’s default storage engine.
Rodrigo Dellacqua says:

July 11, 2010 at 3:06 pm

@Nuno so someone comes to public, post something that’s completely not true, which HUNDREDS of production successful apps use, and shall he be free of a discussion on his thoughts? Sorry, if you can’t listen, you shouldn’t write.

Anyhow, we could just come in here and trash CouchDb right? BUT we didn’t. We faced lies with truth.

Its not hard to google for “CouchDb sucks” and see a post with someone trashing couchdb and your lovely community arguing that its not like that.

Your point of view ONLY, much? kthx bay
mikeal says:

July 11, 2010 at 3:27 pm

This post is not about CouchDB vs MongoDB and that should be clear to anyone who reads the article in it’s entirety.

It’s about durability in the broader world of databases which is why there is so much information about Cassandra, Redis and RDBMS in general. MongoDB has a different view of durability as you can see from their implementation decisions, priorities, and comments in this article. They plainly disagree with the traditional approach to durability and that isn’t a difference between MongoDB and CouchDB it’s a difference between MongoDB and almost any other database you could choose to solve a similar problem space.

There are a lot more people commenting on this article than just MongoDB and CouchDB people, I can see people from MarkLogic and Postgres in the comments and on Twitter many other RDBMS and Redis people have commented as well.

I’m happy to see some people are working on better durability in MongoDB and that the community seems to be excited about it. That’s great and I don’t understand why there is so much anti-durability sentiment from other commenters.
Episode 3: Data store and visualization news | Histojamming says:

July 11, 2010 at 5:21 pm

[...] MongoDB Performance & Durability [...]
Sammy says:

July 12, 2010 at 10:18 am

> That’s great and I don’t understand why there is so much anti-durability sentiment from other commenters.

Because that’s often an outdated and dangerous way of thinking.

Anyone who thinks that there data is safe with durability, please let me know what services you run so I can avoid them….

I’d take data in memory on 2 servers across the country over on disk in 1 any damn day.

(w=2 FTW)
links for 2010-07-13 « Gatunogatuno’s Weblog says:

July 13, 2010 at 1:07 am

[...] MongoDB Performance & Durability Una opinion no tan hype de NoSQL, especialmente de MongoDB (tags: nosql database performance mongodb) [...]
Nuno says:

July 13, 2010 at 10:08 am

@sammy: did you bank ever loose your money? If it did (they probably did with some computer problem) how do you feel about not having a db journal so they could recover (like someone did before u could even notice)? What about your medical profile? What about your gmail account? What about …

Just because its impossible to have a perfect solution it doesnt mean no solution is necessary. I dont think anyone else here will agree with you that durability is outdated. Mongo plans to improve on this according to what I’ve read here. They even agree (at least to my understanding) that mvcc is they way to go to prevent problems and have a more solid and less corruption prone storage system. Bottom line having replication doesn’t mean you don’t need to care about durability: just means you have more options
Zahariash says:

July 15, 2010 at 8:42 pm

Despite of same people says there is no such thing as “single server durability”. If you refuses to admit that you would be crying some day… It’s that simple.

Ask yourself if you are fine with putting production server offline for doing recovery from db append-only log. 5 minutes downtime is ok? 10? Maybe 30…? Oh, you don’t have verified procedure for this… So lets change the units to hours. Your clients are so patient and forgiving…