Cracking the Hebrew Code
Prof. Avi Shmidman of The Joseph and Norman Berman Department of Literature of the Jewish People holds a manuscript from the Cairo Genizah containing a “qerovah poem,” or a poetic alternative to the Shmoneh Esreh, for the 10th of Tevet.
The Faculty of Jewish Studies
Cracking the Hebrew Code
By building AI tools that can navigate the notoriously ambiguous Hebrew language, the linguist, computer scientist, and scholar of medieval Jewish texts Dr. Avi Shmidman is enabling new avenues of inquiry and new perspectives on Jewish life throughout history.
The Right Man at the Right Time
In 2015, when the field of digital humanities was just starting to take shape, “universities were struggling to build teams that had all the component parts,” recalls Dr. Avi Shmidman, a scholar of medieval Jewish poetry who had spent more than a decade building devices for the military avionics sector at the time. “The Jewish studies researchers wanted computer scientists to help them transcribe and search handwritten texts, but computer scientists insisted they first needed linguists to ‘disambiguate’ the Hebrew language”—which, explains Shmidman, suffers from “extreme ambiguity” on account of the lack of vowels, attached prefixes, and the absence of capital letters.
Fortunately, alongside his training in computer science, Shmidman had also previously pursued studies in linguistics. Joking that he was effectively all three team members in one, he seized the opportunity to do what he calls “foundational work”: Namely, allow scholars to uncover trends, make new connections, and find fruitful new avenues of study within the corpus of Jewish texts.
“In the past,” explains Shmidman, “if a scholar wanted to find all the places where a certain halachic opinion was given, he would have to search through texts himself. There was no way to automate the process, because ensuring relevant results meant that the algorithms had to ‘understand’ the meaning of the words and structures in every instance.” Now, however, Shmidman’s algorithms can do just that, helping scholars trace back the history of a ruling, a prayer, or an idea, and see the many vicissitudes that shaped the version we have today. This is precisely the goal of the six-year, €10 million ERC synergy grant awarded jointly in 2022 to Shmidman and his partners at Sciences-Lettres University in Paris and Tel Aviv University, which is developing algorithms capable of extracting text from handwritten Hebrew manuscripts. In addition, the grant’s partners are developing an algorithm for paleography, which can determine when a text was written by the handwritten letters’ shape. Once these algorithms can transcribe, decipher, and catalogue all the ancient texts in the National Library of Israel, they can facilitate the automated searches that lead to new discoveries.
“The path from building devices for military aircraft to digital tools that decipher Jewish texts is certainly unusual,” concedes Shmidman. “But I like to think that in both cases, I’ve helped the Jewish people to reach new heights.”
Together Again: Joining a Lost Manuscript
Usually, genizah researchers work with online images of fragments and their digitally calculated measurements, which enables the easy identification of parts of the same manuscript. In one instance, however, the digital measurements provided for a Jerusalem Genizah fragment were substantially narrower than those of a Cambridge fragment that Shmidman was certain had originated from the very same codex. Suspecting a mistake, Shmidman inspected the physical specimen (shown here). To his surprise, he found that the edge of the fragment had been folded back against the other side of the page. After carefully unfolding the flap with the archivists at the National Library of Israel, he discovered that its width precisely fit that of the Cambridge fragment; undeniably, these two fragments—stored in two different collections, in two different parts of the world—form one continuous prayer manuscript, now joined together again. His finding was published in Hebrew Union College Annual, a prestigious journal in the field of Jewish studies.