One of my favorite algorithms for this is Expectation Maximization [0].
You would start by estimating each driver's rating as the average of their ratings - and then estimate the bias of each rider by comparing the average rating they give to the estimated score of their drivers. Then you repeat the process iteratively until you see both scores (driver rating, and user bias) converge.)
[0] https://en.wikipedia.org/wiki/Expectation%E2%80%93maximizati...
Rating systemen should really mature to exclude non customers and list the customers purchase history.
Weight by amount spend could be interesting.
Big vendors/companies should probably be required to have per product ratings rather than optional. Rating adobe or alibaba on general is probably not all that useful.
The EU almost requires it but google (for example) still didnt find a nice technical solution.
I like rating systems from -2 to +2 for this reason.
The big rating problem I have is with sites like boardgamegeek where ratings are treated by different people as either an objective rating of how good the game is within its category, or subjectively how much they like (or approve of) the game. They're two very different things and it makes the ratings much less useful than they could be.
They also suffer a similar problem in that most games score 7 out of 10. 8 is exceptional, 6 is bad, and 5 is disastrous.
I'd rather we just did an increment of 3 rating. 1. Bad 2. Fine 3. Great
2 and 4 are irrelevant and/or a wild guess or user defined/specific.
Most of the time our rating systems devolve into roughly this state anyways.
E.g.
5 is excellent 4.x is fine <4 is problematic
And then there's a sub domain of the area between 4 and 5 where a 4.1 is questionable, 4.5 is fine and 4.7+ is excellent
In the end, it's just 3 parts nested within 3 parts nested within 3 parts nested within....
Let's just do 3 stars (no decimal) and call it a day
> I'm genuinely mystified why its not applied anywhere I can see.
I wonder if companies are afraid of being accused of "cooking the books", especially in contexts where the individual ratings are visible.
If I saw a product with 3x 5-star reviews and 1x 3-star review, I'd be suspicious if the overall rating was still a perfect 5 stars.
A problem with accounting for "above average" service is sometimes I don't want it. If a driver goes above and beyond, offering a water bottle or something else exceptional, occasionally I would rather be left alone during a quiet, impersonal ride.
For uber you don't need a rating at all. The tracking system knows if they were late, if they took a good route and if they dropped you off at the wrong location.
Anything really bad can be dealt with via a complaint system.
Anything exceptional could be asked by a free text field when giving a tip.
Who is going to read all those text fields and classify them? AI!
One formal measure of this is Inter-Rater Reliability
I have often had the same thought, and I have to believe the reason is that the companies' bottom line is not impacted the tiniest bit by their ratings' systems. It wouldn't be that hard to do better, but anything that takes a non-zero amount of attention and effort to improve, has to compete with all of those other priorities. As far as I can tell, they just don't care at all about how useful their rating system is.
Alternatively, there might be some hidden reason why a broken rating system is better than a good one, but if so I don't know it.
Isn't this basically a de-biasing problem? Treat each riderās ratings as a random variable with its own mean μᵤ and variance Ļᵤ², then normalize. Basically compute z = (r ā μᵤ)/Ļᵤ, then remap z back onto a 1ā5 scale so ānormalā always centers around ~3. You could also add a time decay to weight recent rides higher to adapt when someoneās rating habits drift.
Has anyone seen a live system (Uber, Goodreads, etc.) implement per-user z-score normalization?
Does anyone else get that survey rating effect where you start off thinking the company is reasonable, you give a 4 or 5, then the next page asks for why you chose this and as you think it through you realise more and more shitty things they did, so you go back to bring them down to a 2 or 3. Effectively by asking in detail they undermine the perception of them
Check the bad reviews. If the 1-2 star reviews are mostly about the rude owner, then you know the food is good.
Has anyone done a forced ranking rating?
"Here's your last 5 drivers, please rank them"
I don't understand why letter grades aren't more popular for rating things in the US.
"A+" "B" "C-" "F", etc. feel a lot more intuitive than how stars are used.
A++++ article!
I give five stars always because Iām not a rat.
Same for peer reviews. Giving anything less than a four is saying fire this person. And even too many fours is PIP territory.
Similarly - one of my biggest complaints about almost every rating system in production is how just absolutely lazy they are. And by that, I mean everyone seems to think "the object's collective rating is an average of all the individual ratings" is good enough. It's not.
Take any given Yelp / Google / Amazon page and you'll see some distribution like this:
User 1: "5 stars. Everything was great!"
User 2: "5 stars. I'd go here again!"
User 3: "1 star. The food was delicious but the waiter was so rude!!!one11!! They forgot it was my cousin's sister's mother's birthday and they didn't kiss my hand when I sat down!! I love the food here but they need to fire that one waiter!!"
Yelp: 3.6 stars average rating.
One thing I always liked about FourSquare was that they did NOT use this lazy method. Their score was actually intelligent - it checked things like how often someone would return, how much time they spent there, etc. and weighted a review accordingly.