See also Nat’s twitter announcement: https://twitter.com/natfriedman/status/1712470683207532906
$700k is a life changing amount of money. I admit, it’s tempting to drop everything and go devote myself like a monk to the pursuit of ancient enlightenment via modern ML. I wonder where we’d start…
It’s also funny that the scroll might just be a laundry list.
I highly recommend getting into classical literature. It is incredible and beyond conception — I had a light scattering in my education, but only later and recently have I discovered how incredible it can be.
Suggested Reading for beginners:
* Life of Pythagoras, by Iamblichus
* The Golden Ass, by Apuleius of Numenia (specifically, translation by Robert Graves)
* Life of Alexander by Plutarch
* Education of Cyrus by Xenophon
* Parmenides by Plato
Also, I have found SHWEP.net to be invaluable for a gentle yet rigorous guide through many classics, though it takes an esoteric bent (which I love)
I recently saw a wonderful youtube video on this: https://www.youtube.com/watch?v=Z_L1oN8y7Bs
Title: Herculaneum scrolls: A 20-year journey to read the unreadable
it goes a little bit into the technology of how this was done, deep learning finally cracked the code. They had the scans for a decade but it took ML training to be able to identify which parts were paper and which parts were the ink on top. This had been done on a different set of scrolls with easier to read higher contrasting materials like the video says, 20 years ago. Deep learning is cracking the code for these datasets we had previously thought were impossible to algorithmically solve.
This is the 21st-century equivalent of living through the opening of Tut's tomb. Incredible to think there's a very real chance that in the medium-term future you might be able to buy a copy of a newly-translated work on Amazon that hasn't been read for millennia.
That's extremely cool. I wonder what we'll learn.
As an aside, the "Professor Seales and team scanning at the particle accelerator" photo looks like it came from a TV show. "If we keep telling the computer 'enhance', we'll be able to read it".
This might count as one of the most extreme stories of data recovery I've seen. I wonder if in another 2000 years we'll have a "first file discovered on discarded hard drive platter".
The first 30 minutes of this interview does a great job of explaining what's going on here. Really interesting stuff.
> In early August, contestant Casey Handmer, an ex-JPL startup founder and polymath
interesting terminology, I've never been given accolades for being a multifaceted human being.
I've gotten "generalist" and "after much consideration, we have decided not to proceed with your candidacy "
https://scrollprize.org/ explains the original challenge/issue: other research showed "virtual unwrapping" based on CT scans was possible, but these scrolls had ink not clearly visible on CT/X-ray, so they had to go back to less visible structure (I'm not sure if it's the structure of the paper that changes or actually the ink being visible but less).
I wonder how many AI massage iterations this is away from being able to completely copy arbitrary books without opening them, and if this technology will hasten the demise of paper texts.
If you can just stack 20 random books and within seconds have them be indexed and searchable digital ones, libraries as we know them will suffer perhaps the final blow in obsolescence.
I wrote this for a different community (filled with semiliterate sophists), but this is absolutely huge and could upend huge swathes of understanding about the last two thousand years.
You can avoid the longform essay below if you want. The short of it is there are several potentially common works possibly in the library that could directly prove or disprove what is found in the New Testament and the predicates of Rabbinic Judaism as established at the Council of Jamnia.
We could be seeing the beginning of conclusive proof that invalidates the narratives of Christianity, Judaism, and Islam by the end of the year.
The Vesuvius Challenge isn't just an interesting contest in the machine learning realm; it's a groundbreaking endeavor that could redefine our understanding of the humanities if successful. The opportunity to digitally unroll and read the Herculaneum Papyri could offer unprecedented insights into ancient civilizations and the total feedstock of civilization today. This is not merely about filling in some historical gaps; it’s about fundamentally altering how we understand antiquity and, by extension, our own intellectual heritage.
The loss of the Library of Alexandria has long been considered a "dark age" event for intellectual progress. Now, consider the Herculaneum library—a collection of papyri from a villa once owned by Julius Caesar's father-in-law, carbonized but preserved by the Vesuvius eruption in 79 AD. Hundreds of these scrolls are unreadable because their carbon-based ink blends in with the carbonized papyrus, and thus are invisible to conventional imaging techniques. Yet, these scrolls are quite possibly on the cusp of revelation.
Recent developments have introduced machine learning and high-resolution X-ray scans as methods for reading these "unreadable" scrolls. What texts do they contain? Treatises on science and philosophy? The lost books of Livy? The epic cycle? Governmental policies like the Twelve Tables? It’s a tantalizing question because whatever is locked in those scrolls could be an unfiltered look at the Roman Empire—an empire that fundamentally influenced the trajectory of Western culture, religion, governance, and philosophy.
Ponder a history of Rome that has not been retouched by myriadic emperors, by Constantine's Christianity, or the interpretive lens of the Roman Catholic Church. Unmediated accounts of Roman society, unaltered by the layers of religious and political power that came later, could rewrite our textbooks and shift the justification of history. It’s not just about enriching our understanding of ancient civilizations; this could be a cornerstone on which to build a fresh philosophical understanding of human society.
If the project succeeds, there will be repercussions in the academic realm. The humanities have long struggled to justify their existence in a world that increasingly prizes STEM and lacks any novel sources for the classical world. Suddenly, there could be a concrete, urgent task at hand: to decode, interpret, and integrate an influx of new knowledge. The Vesuvius Challenge could revitalize the field, offering an unforeseen but compelling reason for its study. In essence, it provides a utilitarian justification for the humanities, one that transcends 'cultural enrichment' and enters the realm of 'historical redefinition.'
The Vesuvius Challenge could be the hinge upon which history swings, yielding intellectual treasure that could be as groundbreaking as the writings that were lost in Alexandria. For millennia, those scrolls have remained unread. Now, it's a software problem. That's not just a challenge; it’s an imperative.
The presence of specific works in the Herculaneum Papyri could dramatically impact our understanding of major historical events.
In particular for me, I pray that the biography of Herod the Great by Nicholas of Damascus is discovered intact. While mainstream accounts generally portray the life of Herod within the context of Roman patronage and Judaean politics, uncovering a contemporary account by a close intimate (and used as a primary source by Josephus) would offer fresh, unmediated insights into his rule and its socio-political intricacies. Chronologies of the life of Jesus could be explicitly validated or disproved.
The relevance here is far from academic. Consider the following naturalistic hypothesis: that the inception and rise of Christianity was entirely a dynastic struggle within the Hasmonean-Herodian line. What if the tale of Jesus is, in essence, a dramatized, mystified rendition of a 1st-century dynastic conflict, one that was subsequently co-opted and transformed into a religious narrative by an early form of conspiratorial thinking? Something like a 1st-century version of Q-anon, distorting real events to serve an alternative, concealed agenda in the aftermath of the First Jewish-Roman War.
Unveiling a document like Nicholas of Damascus' biography could be groundbreaking in testing such a hypothesis. If Herod's life and rule were detailed without the religious overlays that later Christian interpretations bring into the picture, one could make more definitive assertions about the socio-political environment of the time. Furthermore, it could provide concrete evidence to either substantiate or refute theories about Christianity's emergence as a byproduct of a Herodian-Hasmonean power struggle.
The fact that such a theory could be tested is significant in its own right. Traditionally, discussions about early Christianity rely heavily on religious texts and subsequent historical accounts, many of which are fraught with dogma and ideological interpretations. A primary source devoid of such influences would be a game-changer, offering a baseline of raw data from which more accurate and reliable hypotheses could be drawn.
And it's not limited solely to Christianity. Rabbinic Judaism could have equally monumental implications as a result. The owner of the villa, likely a wealthy Roman, would be unlikely to have had any primary Hebrew texts like the Pentateuch. However, that doesn't rule out the possibility of possessing Greek or Latin works discussing Jewish culture, beliefs, and politics. Given the villa's historical context, it's conceivable that there might be indirect ethnographic accounts from the period surrounding the destruction of Jerusalem in 70 AD but before the Council of Jamnia, traditionally dated around 90 AD, which helped canonize Hebrew scriptures.
Why is this important? The Council of Jamnia is often cited as a crucial moment for the development of Rabbinic Judaism. It allegedly led to the fixing of the Hebrew Bible canon and crystallized what would become Talmudic tradition. If documents were to surface that provide a snapshot of Judaic thought and practice just before this council, it could upend millennia of precedent and identity.
In a broader context, discovering pre-Jamnia ethnographic sources could significantly change our understanding of how Judaism adapted and evolved in the aftermath of the Second Temple's destruction. This could lead to far-reaching questions. How much of the Talmudic tradition was actually a post-hoc rationalization or systematization of beliefs and practices that were far more fluid before the Council of Jamnia? How much anti-Romanism was pared away to prevent suppression? Moreover, how would such a revelation interact with or even challenge the validity of current Rabbinic and Orthodox Jewish practices?
The implications for the Judeo-Christian heritage as a whole are staggering. If both Christianity and Judaism could be traced back explicitly to politically or socially motivated machinations, rather than divinely inspired or time-honored traditions, the entire foundation of Judeo-Christian culture would come into question. In essence, the Vesuvius Challenge has the potential to destabilize two of the world’s major religious traditions at their historical roots. It is difficult to overstate the potential impacts.
The Vesuvius Challenge is not just an academic or technological endeavor. Its success could instigate an unparalleled epistemological crisis in religious studies and the humanities. It provides the opportunity to re-examine, with primary sources, the historical foundations of Western religious, cultural, and ultimately political traditions. We're not just potentially rewriting history here; we're reevaluating the very frameworks through which that history has been understood.
From my perspective it may be just fitting to the answer. We try to find symbols, without confidence, that the paper still contains information, and any text for trainig. With "rights" network you can achieve any possibile result. It is remember me a russian freak-scientist, which try to read words and texts on the detailed sun surface photos.
This, along with the way the Antikythera mechanism [1] fragments were decoded [2], always brings to mind the ending of "A.I.: Artificial Intelligence", where aliens far into the future managed to recreate part of human civilization from remaining artifacts and a barely working robot child.
[1] https://en.wikipedia.org/wiki/Antikythera_mechanism [2] https://www.youtube.com/watch?v=6Wp3wL8g2Eg
How can they be sure that the results are actual words from the scroll and not just hallucinations of the neural network? If the scrolls are in such bad condition that the data is almost only noise then what stops their high-tech deep learning model from just.. making it all up?
It is amazing what some college student can pull off with today's technology.
OT: never before have i seen a 2 days old story coming back to the front page
> Note that texts from this time didn’t use spaces, making it harder to determine word boundaries.
Paging Germans
Not bad seeing as he solved this while working as an intern at SpaceX too!
i love this project. i feel like this is going to be a great source of interest and value over the next few years (and potentially immesurable value over longer time frames).
Somewhat off-topic but if you clicked in here, you might be interested in this book: "The Riddle of the Labyrinth: The Quest to Crack an Ancient Code".
On a side note iirc there were also some Dead Sea scrolls that were hard to open, were they able to open and read all of the scrolls?
The word is Ancient Greek for “purple”..it takes a lot of reading to find out!
Based on this, I wonder if the main challenge has already been solved.
Are they controlling for this? How is the validation being done?
imagine the person making this scroll 2,000 years ago wondering 'I wonder if some kid 2000 years in the future is going to win a boat load of money by reading this'
I love uses of machine learning like this a thousand times more than generative LLMs spouting probable-sounding nonsense.
>Shortly after that, another contestant, Youssef Nader, independently discovered the same word in the same area, with even clearer results — winning the second place prize of $10,000.
That's what u get for optimising your code
"He found a few dozen ink strokes - and some complete letters - that could be labeled and used as training data.
Before long, the model was unveiling traces of crackle invisible to his own eye. Soon, these traces began to form letters and hints of actual words."
This does not sound like a "Large Language Model (LLM)" or other large set of training data, like the sort hyped by so-called "tech" companies; this sounds relatively small. What am I missing. (Besides brain cells.)
> Casey was the first person in 2,000 years to find ink — and a letter — inside an unopened scroll.
Amusing that this implies the Vesuviuans had the ability to read unopened scrolls.
The lettering was found by looking for 'crackle' texture on papyrus segments from the CT scans which obviously were in the shape of Greek letters, and annotating those as training data. Unfortunately such crackle texture isn't visible, at least by eye, on most of the papyrus. Probably it's only that visible where the ink was very thick. You can easily see the difference in texture in this electron microscope image [1] (far higher resolution than the CT scans) but especially on the very edge of the inked area (the narrow strip in the left image; I think the whole right image is inked) where the ink was pushed to. I'm surprised the crackle was discovered only after the Kaggle Ink Detection contest. Looking at the CT-scanned fragments with infrared ground truths, which were used in the Kaggle contest, Casey Handmer wrote [2]:
> The ongoing apparent failure of deep-learning based ink detection based on the fragments indicated to me that direct inspection of the actual data would be more fruitful, as it has been here.
> ...
> I found similar “cracked mud” and “flake” textures corresponding to known character ink, but only for perhaps 10% of the known characters. It’s been a long day, I can probably find more on closer inspection, but that does make one wonder about automated ink detection and what that is seeing.
These new images are much better than I hoped for, but still only in one small area, so I'm still pessimistic about more than an odd sentence being readable.
[1] https://scrollprize.org/img/tutorials/sem.png
[2] https://caseyhandmer.wordpress.com/2023/08/05/reading-ancien...