About The Author

Your Startup Sucks

& OTHER HAPPY THOUGHTS
Fighting cynicism through sarcasm, one quibble at a time.

Recent comments

  • November 6, 2011 9:00 am

    Why The MongoDB Hate?



    Disclosure: I hack on MongoDB.
    Update: Check out Eliot’s (10gen’s CTO) response here

    I was a little surprised to see all of the MongoDB hate in this Hacker News thread (and later this similar proggit thread). I’m going to do my best to reply to these concerns directly and with a minimum of breast-beating.

    There seems to be quite a bit of misinformation out there: lots of folks seem focused on the global R/W lock and how it must lead to lousy performance. In practice, the global R/W isn’t optimal — but it’s really not a big deal. Here’s why:

    First, MongoDB is designed to be run on a machine with sufficient primary memory to hold the working set. In this case, writes finish extremely quickly and therefore lock contention is quite low. Optimizing for this data pattern is a fundamental design decision.

    Second, long running operations (i.e., just before a pageout) cause the MongoDB kernel to yield. This prevents slow operations from screwing the pooch, so to speak. Not perfect, but smooths over many problematic cases.

    Third, the MongoDB developer community is extremely passionate about the project. Fine-grained locking and concurrency are areas of active development. The allegation that features or patches are withheld from the broader community is total bunk; the team at 10gen is dedicated, community-focused, and honest. Take a look at the Google Group, JIRA, or disqus if you don’t believe me: “free” tickets and questions get resolved very, very quickly.

    Other criticisms of MongoDB concerning in-place updates and durability are worth looking at a bit more closely. MongoDB is designed to scale very well for applications where a single master (and/or sharding) makes sense. Thus, the “idiomatic” way of achieving durability in MongoDB is through replication — journaling comes at a cost that can, in a properly replicated environment, be safely factored out. That’s another design decision.

    Next, in-place updates allow for extremely fast writes provided a correctly designed schema and an aversion to document-growing updates (i.e., $push). If you meet these requirements— or select an appropriate padding factor— you’ll enjoy high performance without having to garbage collect old versions of data or store more cruft than you need. Again, this is a design decision.

    Finally, it is worth stressing the convenience and flexibility of a schemaless document-oriented datastore. Migrations are greatly simplified and generic models (i.e., product or profile) no longer require a zillion joins. In many regards, working with a schemaless store is a lot like working with an interpreted language: you don’t have to mess with “compilation” and you enjoy a bit more flexibility (though you’ll need to be more careful at runtime). It’s worth noting that MongoDB provides support for dynamic querying of this schemaless data — you’re free to ask whatever you like, indices be damned. Many other schemaless stores do not provide this functionality.

    Regardless of the above, if you’re looking to scale writes and can tolerate data conflicts (due to outages or network partitions), you might be better served by Cassandra, CouchDB, or another master-master/NoSQL/fill-in-the-blank datastore. It’s really up to the developer to select the right tool for the job and to use that tool the way it’s designed to be used.

    At the end of the day, MongoDB is a neat piece of software that’s designed to be useful for a particular subset of applications. Does it always work perfectly? No. Is it the best for everything? Not at all. Do the developers care? You better believe they do.

    Things haven’t always been peachy. That’s why we recommend all new users start with the 2.0.x series (frankly, it seems unfair to rail on a project when you’re two major versions behind). If you look back, you’ll find mistakes: but you’ll also find a team of dedicated people who have worked hard to fix those mistakes.

    10gen has built a novel datastore that offers high availability, sharding, and schema-free design at a very specific cost. Bugs will be pushed, mistakes will be made, and systems will go down. There is no silver bullet.

    If you’ve got a mission critical application and you’re looking for a datastore, the first question you should be asking isn’t about internals or anecdotes, it’s “how are the inevitable boo-boos handled?” If the answer isn’t “efficiently, transparently, and with a heaping spoonful of honesty” — like it is with 10gen — you’ve got bigger problems.

    1. adisetiawan reblogged this from startupsucks
    2. kiriappeee reblogged this from startupsucks
    3. startupsucks posted this