From the jepsen report:
"""
Curiously, MongoDB omitted any mention of these findings in their MongoDB and Jepsen page. Instead, that page discusses only passing results, makes no mention of read or write concern, buries the actual report in a footnote, and goes on to claim:
> MongoDB offers among the strongest data consistency, correctness, and safety guarantees of any database available today.
We encourage MongoDB to report Jepsen findings in context: while MongoDB did appear to offer per-document linearizability and causal consistency with the strongest settings, it also failed to offer those properties in most configurations.
"""
This is a really professional to tell someone to stop their nonsense.
You can tell a lot about a developer by their preferred database.
* Mongo: I like things easy, even if easy is dangerous. I probably write Javascript exclusively
* MySQL: I don't like to rock the boat, and MySQL is available everywhere
* PostgreSQL: I'm not afraid of the command line
* H2: My company can't afford a database admin, so I embedded the database in our application (I have actually done this)
* SQLite: I'm either using SQLite as my app's file format, writing a smartphone app, or about to realize the difference between load-in-test and load-in-production
* RabbitMQ: I don't know what a database is
* Redis: I got tired of optimizing SQL queries
* Oracle: I'm being paid to sell you Oracle
All: we've changed the submitted URL from https://www.infoq.com/news/2020/05/Jepsen-MongoDB-4-2-6 to the work it is reporting on. You might want to read both, since the infoq.com article does give a bit of background.
Edit: never mind, I think the other URL - http://jepsen.io/analyses/mongodb-4.2.6 - deserves a more technical thread, so will invite aphyr to repost it instead. It had a thread already (https://news.ycombinator.com/item?id=23191439) but despite getting a lot of upvotes, failed to make the front page (http://hnrankings.info/23191439/). I have no idea why—there were no moderation or other penalties on it. Sometimes HN's software produces weird effects as the firehose of content tries to make it through the tiny aperture of the frontpage.
Lying about your test results from Jepsen is like going onto a reality show with Chef Ramsey, being thrown off for incompetence, then putting his name on your restautant's ads "Chef Ramsey ate here!"
I'd pay to watch Kyle screaming at people in the MongoDB offices, not that he screams or anything. Just a spectacular mental image: "IT'S NOT ATOMIC! IT COULDN'T SERIALIZE A DOG'S DINNER!"
MongoDB's big problem is that their present user base does not want the problems fixed, particularly at default settings, because it would mean going slower. Their users are self-selected as not caring much about integrity and durability. There are lots of applications where those qualities are just not very important, but speed is. People with such applications do need help with data management, and have money to spend on it.
The stock market wants to see the product as a competitor with Oracle, so demands all the certifications that say so. MongoDB marketing wants to be able to collect money as if the product were competitive. Many of the customers have management that would be embarrassed to spend that kind of money on a database that is not. And, ultimately, many of the applications do have durability requirements for some of the data.
So, MongoDB's engineers are pulled in one direction by actual (paying) users, and the opposite direction by the money people. It's not a good place to be. They have very competent engineers, but they have set themselves a problem that might not be solvable under their constraints, and that they might not be able to prove they have solved, if they did. Time spent on it does not address what most customers want to see progress on.
MongoDB started life as a database designed for speed and ease of use over durability. That's not a good look for a database.
People have told me that they have since changed, but the evidence is overwhelmingly and repeatedly against them.
They seem to have been successful on marketing alone. Or people care more about speed and ease of use than durability, and my assumptions about what people want in a database are just wrong.
The Jepsen analysis : https://jepsen.io/analyses/mongodb-4.2.6
I wonder if I'm the only sysadmin in the world who doesn't hate MongoDB. Yes, I wouldn't use it for new projects, and yes, I wish RethinkDB had taken its place, but it's not as horrible as people seem to think. Default configuration... If it weren't for RDS' doing PG-bouncer-style connection management, 95% of production postgres instances would probably fail. It innodb_buffer_pool_size wasn't set properly, plenty of data-centers would light on fire. If no one setup a firewall or AOF for redis, it's data-loss and data-exposure waiting to happen. If no one adds auth to an HTTP route, it's open to the world, etc etc etc. If tech-stacks were legos, software engineers would earn a heck of a lot less.
I absolutely agree it's been used by people who just don't want to write SQL queries, or being used as a text-search-engine in place of something like more appropriate like ElasticSearch, but to mock successful projects who were based on it seems silly. It reminds me of interviewing candidates at a startup who primarily used PHP/MySQL. Most of them openly laughed and called it all horrible. I voted "no" on them, and sometimes injected a somewhat toxic "ah, you're right - we should close up shop. Someone call Facebook - tell them their tech stack is horrible - shut it all down!".
You can learn a lot about a developer by asking "What do you think about Mongo, JavaScript, or PHP", and if their response isn't a shrug, they're probably more concerned with what editor is correct than if the product they're building is useful. It's an exceptional filter to reject zealots and find pragmatists.
All that said, MariaDB with MyRocks is _awesome_, but certainly not with the default settings :)
There is much amusement to be obtained from reading Jepsen's report:
"MongoDB’s default level of write concern was (and remains) acknowledgement by a single node, which means MongoDB may lose data by default.
...Similarly, MongoDB’s default level of read concern allows aborted reads: readers can observe state that is not fully committed, and could be discarded in the future. As the read isolation consistency docs note, “Read uncommitted is the default isolation level”.
We found that due to these weak defaults, MongoDB’s causal sessions did not preserve causal consistency by default: users needed to specify both write and read concern majority (or higher) to actually get causal consistency. MongoDB closed the issue, saying it was working as designed"
MongoDB is horrible, I get it.
What do I use in this situation:
1) I need to store 100,000,000+ json files in a database
2) query the data in these json files
3) json files come from thousands upon thousands of different sources, each with their own drastically different "schema"
4) constantly adding more json files from constantly new sources
5) no time to figure out the schema prior to adding into the database
6) don't care if a json file is lost once in awhile
7) only 1 table, no relational tables needed
8) easy replication and sharding across servers sought after
9) don't actually require json, so long as data can be easily mapped from json to database format and back
10) can self host, no cloud only lock-in
Recommendations?
I think it's remarkable this report has been out for a week now and no one at MongoDB has commented on it. At least, not that I have seen.
"We found that due to these weak defaults, MongoDB’s causal sessions did not preserve causal consistency by default: users needed to specify both write and read concern majority (or higher) to actually get causal consistency. MongoDB closed the issue, saying it was working as designed, and updated their isolation documentation to note that even though MongoDB offers “causal consistency in client sessions”, that guarantee does not hold unless users take care to use both read and write concern majority. A detailed table now shows the properties offered by weaker read and write concerns."
That sounds like a valid redress, or am I missing something ?
Oh, Jepsen and MongoDB again? Somebody get the popcorn!
I still remember when MongoDB was the new kid on the block and it was lauded as the only thing you should be using here on HN.
I'm glad my gut instinct was correct and that it really wasn't worth the hype. It reminds me of Ruby on Rails.
Every time I see a post about Mongo it makes me wonder what could have been if RethinkDB was managed differently.
I worked at one company where the network traffic just on the MongoDB master was around 2gb/s. We had machines with terrabytes of memory, and Mongo worked fine - until we had some replica set nightmares. Mongo support is amazing, but when replication breaks it's very hard to diagnose (usually it was our fault, but it felt very fragile).
I used mongodb for 1 year for a milti million user app. I abondened it. The reliability and stability is just not good. I wanted it to be good, but it turned out to be a different
Ok, so defaults suck, marketing is misleading, documentation and error messages are not exactly obvious. Assuming you are already stuck in the soup, putting those issues aside and getting practical instead instead of throwing more fire on the discussion:
If you set w: majority and r: linearizable/snapshot, both on collection, client and on transactions. Plus assuming you accept snapshot over Isolation. How bad are those remaining cases in reality and how do these issues compare to other databases? The final "read your future writes" error looks quite scary and does not seem to be caused by configuration error, same with "duplicate effects".
Discussed previously:
Our company migrated away from MongoDB, here's a talk about how we did it, in case you're thinking about what is involved and how to do it safely: https://www.youtube.com/watch?v=Knd3m2qh0o8
Ubiquity used MongoDB for their CloudKey Gen1 series. When there was an unexpected shutdown, there’s a random chance it would lose its configuration [1]. If your SD backup didn’t work, you’d lose configuration for all WiFi hotspots. If you did client installs like I did, this was a total nightmare. How did they solve it? Release new, more expensive hardware with a battery backup acting like a UPS. Never solved Gen1 issues. Imagine your phone corrupting after a hard reset. Thanks Ubiquity & MongoDB
[1] https://community.ui.com/questions/MongoDB-corrupt-after-eve...
If you want to be "that guy" on parties, ask people what MongoDB is trying so solve. If they bring up the typical "noSQL document store" stuff, aks them why you'd want to use MongoDB for that.
MongoDB uninstalled our cloud hosted cluster once and the site was down and we needed to setup a large database from backups. Their response was very unhelpful. I would never touch MongoDB again.
Regardless of tech, MDB is a weird stock that go up steadily every time.
It looks like relatively few people clicked through to read the analysis itself, so @dang's kindly offered to repost it. You can find the analysis here:
https://jepsen.io/analyses/mongodb-4.2.6
... and the corresponding HN thread here:
If you're looking for MongoDB done right, it does exist and it's called RethinkDB. For some reason it didn't catch on and become popular — but it's nicer, and most importantly, it doesn't lose your data.
Data point: I have been running my production system (a fairly complex SaaS) on RethinkDB for the last 4 years.
Main argument for using document-oriented databases: https://martinfowler.com/bliki/AggregateOrientedDatabase.htm...
Anyone has a recommendation for a NoSQL database?
https://news.ycombinator.com/item?id=23253870
(not Mongo obviously)
[repost - asking for help] I am disappointed with the direction that MongoDB took this past few years. Going ACID shows in benchmarks [1] and it’s not advisable if you are using MongoDB for stats and queue. (No one uses MongoDB for financial transactions despite the changes.)
And the recent change to a restrictive license is worrisome as well. I have been thinking of forking 3.4 and make it back to “true” open source and awesome performance. (If any C++ devs want to help out, reach out to me! username @gmail.com)
<rant>
This corruption is brought on by the stock market.
Have a look also at Shopify. They go and tack on 2% fees when customers use Google Pay or Apple Pay to checkout with. They recently announced that FB would be pulling ecom sales within app, and yet Shopify plans to charge 2% on top of FB fees. That's what I could gather despite the pricing being rather opaque.
Is this a step forward or backwards? Charging 2% / transaction for modern Internet protocols running on cheap hardware across a public network?
</rant>
Obligatory https://www.youtube.com/watch?v=b2F-DItXtZs
Can anyone share any positive experiences with MongoDB? I wouldn’t think MongoDB as perfect like any other piece of tech, but the unanimous hatred for it seems a little overblown. Not trying to discredit the bad experiences people have had with it. Just curious to know where people are using it successfully
This has been a known issue for a while:
https://hackingdistributed.com/2013/01/29/mongo-ft/
MongoDB: Broken By Design
MongoDB is the /dev/null of databases
How is Cassandra as an alternative to MongoDB?
Dan Luu suggested on Twitter that MongoDB trolled Kyle into testing Jepsen again. I think they've made a mistake though. :-)
It seems that the only tangible benefit remaining for DocumentDBs over SQL platforms (PostgreSQL, SQL Server, etc.) is scalability. Jr. devs thinking they can have a career in software dev without learning SQL is not a benefit.
MongoDB is a joke. Tried to use it on a project 3 years ago: it consistently and repeatedly lost new data on non-replicated, single-instance servers. I don’t understand how anyone can use it.
Typical HN posts of late hating on Javascript and MongoDB from database elitists -- the thing is there's a tool for a job and as engineers we need to figure out what tool best suits our use cases. It could very well be a NoSQL database such as Mongo or a relational one like Postgres or MySQL.
In the circles I run in, MongoDB is regarded as a joke and the company behind it as basically duplicitous. For example, they still list Facebook as their first user of MongoDB on their website, for example, but there is no MongoDB use in Facebook hasn't been for years (it came in only via a startup acquisition).
I had the misfortune to use MongoDB at a previous job. The replication protocol wasn't atomic. You would find partial records that were never fixed in replicas. They claimed they fixed that in several releases, but never did. The right answer turned out to be to abandon MongoDB.