Hacker News

League of Legends data scraping the hard and tedious way for fun

by makneeon 2/12/2025, 11:11:38 AM with 16 comments

by jeremiaharon 2/12/2025, 5:44:12 PM
I worked on something like this back in 2016, I'm not sure how much things have changed since then. I used dynamic binary instrumentation to deal with the field encryption. Basically, manually map the executable into executable memory on Linux (as if it were a shared library). Begin execution at the packet switch, but before executing a block of code, disassemble it until a conditional branch, and modify it according to some heuristics to remove the at rest encryption. The original block of code wasn't executed since it might not fit into the original block size, so new blocks were mmap'd for this. Malloc/Free were hooked and replaced with wrappers over glibc's free/malloc, but with bookkeeping so that the memory can be freed after execution of the packet switch. atexit was just replaced with a noop. That all just dealt with the encryption, but there were also randomized packet id's and field orders. Those problems were dealt with by using manually written heuristics based on the packet id's which were actually interesting. Packet handlers with references to text strings (even hashed ones), etc were a gold mine here because they made static detection of packet id's simple. If there was no text string, many of the offsets could be auto detected just by parsing a replay and running small snippets to determine which offsets actually "made sense" for the field that was being searched for. For example, if there was a gold gain packet, the amount of gold gained shouldn't be out of an expected range, or else the offset is likely not corresponding to that field. Once all of the high volume code blocks had been instrumented, replays were able to be parsed in 2-3 seconds (along with generating the desired data aggregations). This is all from memory so it's possible there could be a minor mistake or two.
by pton_xdon 2/12/2025, 8:09:52 PM
I've always heard that "security through obscurity" is discouraged because, well, there's no stopping someone from digging in and figuring it out. However in this case it seems somewhat successful in that the author was not able to decrypt the packets directly.
The article says that "while it might seem feasible to reimplement these functions in Python without running the client, several factors make this approach impractical" and then lists some reasons like the lookup tables changing, chunk layouts getting shuffled, etc.
Is that all it takes to thwart decrypting the packets? Even though, presumably, you have access to all those lookup tables and chunk layouts somewhere in the client? Is it just too much effort to piece together how it works? I'd be curious to hear more specifics on how exactly Riot was able to make reverse engineering this so impractical.
Great article!
by finalfireon 2/12/2025, 3:34:49 PM
This is really something cool, and it is exactly what I was looking for. To give a context, I worked on some data science-inspired studies [1] about LoL, and the future research direction is to provide a formal modeling for the games and analyze them through it. While I had a little success by getting aggregated data from websites such as uol.gg, the granularity is not fine enough to do very interesting analysis.
[1] https://doi.org/10.1016/j.ipm.2023.103516
by landr0idon 2/12/2025, 7:04:48 PM
The World of Warships community has gone through similar steps, but the encryption is much more straightforward. Some of the packets are pickled Python, some are just binary blobs, so there are some undocumented packets but for the most part people have done a decent job of figuring it out and building tooling around it such as the minimap renderer: https://github.com/WoWs-Builder-Team/minimap_renderer
There’s an odd unspoken and somewhat understood agreement between the developer (Wargaming) and community though: the community actively reverse engineers the game to document the packets and WG kind of looks the other way (except when they recently threatened me with a perma ban :) — they even use the tooling the community creates for official tournaments.
In this article the author mentions Riot partnering with external companies to provide more rich data set and analytics. Do they use these tools/data sets for tournaments as well? Is it known at all how these partnerships are structured?
by moonshadow565on 2/12/2025, 3:16:03 PM
> League of Legends runs on a custom game engine developed in 2009.
Developed by Sergey Titov (same engine that powers Big Rigs).
by leloctaion 2/13/2025, 4:35:11 AM
I'm not very well versed in RE, but I know that competitive games like this spend a lot of effort in preventing you from attaching debuggers, hooking and decompilation.
By passing this is not mentioned at all in the article. Is this because they're trivial to bypass for experienced people, or because they want to hide their method from the dev?
by exar0815on 2/12/2025, 5:55:42 PM
I did something similar with a friend for some time for another game.
As it went, our data was used to prove things to the developer they would have loved to hush-hush, which led to a cat and mouse game with the data and their open and... not so open apis. In the End, we stopped playing the game and stopped our efforts at it. Fun times.
by infogulchon 2/12/2025, 5:35:02 PM
Getting data by directly processing the packets instead of using the (buggy, slow) replay system is a great idea. There's a lot of interesting data in the middle of LoL gamestate that is missing in summary overviews that only consider the final state of the game.
by SpaceManNabson 2/12/2025, 4:08:40 PM
One of the cool things about dota is that opendota and stratz provide a lot of data because steam is relatively open.
it is how i wrote a blog post on generating builds for heroes before dota plus even had the feature!
by nomilkon 2/13/2025, 6:17:25 AM
Where/how are images like this made? They're cool. Technical and communicative, but with a relaxed and casual look and feel.
https://maknee.github.io/assets/images/posts/2024-11-02/leag...
by ajsmittyon 2/13/2025, 5:06:02 AM
I remember doing this 10+ years ago now for a site called probuilds. I left lol shortly after this. Cool to see that the packets haven’t changed much. (Based on my memory)
Shortly after I released this for TSM riot came out with the api.
by m0w0kumaon 2/12/2025, 4:37:43 PM
I've been working on something similar [1], but I took a different approach: I statically extract all decryption stubs using a IDA script I wrote, then emulate them using Unicorn. I'm also interested in your implementation details—do you have your code on GitHub or somewhere else?
[1] https://github.com/m0w0kuma/ROFL

by picafroston 2/12/2025, 3:27:45 PM

A tip:

  @media (prefers-color-scheme: dark) {
    img[src*="svg"], img[src*="png"] {
      filter: invert(1) hue-rotate(180deg);
    }
  }

by Kuinoxon 2/12/2025, 1:51:53 PM
The diagrams are not visible in dark mode.
by armanckeseron 2/12/2025, 2:30:16 PM
Really cool project! I am not sure if this is only me, but your dark theme is hiding the illustrations fyi.
by appleaday1on 2/12/2025, 10:33:48 PM
GTFO hackernews, we only play Dota2 here.