Automatic Identification of Travel Locations in Rare Books - Object Oriented Information Management
Detlev Doherr, Andreas Jankowski
The digital content of the Internet is growing exponentially and mass digitization of printed media opens access to literature, in particular the genre of travel literature from the 18th and 19th century, which consists of diaries or travel books describing routes, observations or inspirations. The identification of described locations in the digital text is a long-standing challenge which requires information technology to supply dynamic links to sources by new forms of interaction and synthesis between humanistic texts and scientific observations.
Using object oriented information technology, a prototype of a software tool is developed which makes it possible to automatically identify geographic locations and travel routes mentioned in rare books. The information objects contain properties such as names and classification codes for populated places, streams, mountains and regions. Together, with the latitudes and longitudes of every single location, it is possible to geo-reference this information in order that all processed and filtered datasets can be displayed by a map application. This method has already been used in the Humboldt Digital Library to present Alexander von Humboldt’s maps and was tested in a case study to prove the correctness and reliability of the automatic identification of locations based on the work of Alexander von Humboldt and Johann Wolfgang von Goethe.
The results reveal numerous errors due to misspellings, change of location names, equality of terms and location names. But on the other hand it becomes very clear that results of the automatic object detection and recognition can be improved by error-free and comprehensive sources. As a result an increase in quality and usability of the service can be expected, accompanied by more options to detect unknown locations in the descriptions of rare books.