Norway was under Nazi occupation from 1940 to 1945. Although Wehrmacht’s behavior was less brutal in Norway than in many other European countries, we still had our share of victims:
This year marks the 75th anniversary of the invasion and the 70th anniversary of the liberation. During the winter, an ambitious idea emerged: Would it be possible to map every Norwegian who perished as a victim of the Second World War?
It quickly became clear that our main source would be ”Våre falne” a four-volume book series released by the Norwegian government in 1949, with information about 11,724 Norwegian victims of the war. The books included short biographies and pictures of (almost) each person. The data was not digitalized in any way.
Luckily for us, the Genealogy Society of Norway (DIS Norge) had digitalized parts of the volumes, and we quickly got an agreement on sharing the data. The database contained names and key information, but not the biographies or images.
We then inquired for a digital copy at the National Library of Norway. At first we were told that the biographies were copyrighted to the authors of the volumes. We claimed that the books as a whole should be in the public domain according to the Norwegian freedom of information act (offentleglova), as they were planned and published by the Norwegian Government and financed by the Norwegian municipalities. The Library’s legal department spent one hour to consider our claim, and then basically answered “Yes, you’re right, we just haven’t thought of it that way before.”
They gave us all the pages as JPGs and, as a bonus, their XML-version of each page. Each page had been run through the National Library’s text-recognition software. Suddenly we had a digital version of the biographies and a JPG of each page.
A core-developer at VG got hold of the JPGs, and within 24 hours he managed to create a Python-script which detected the images on each page, and with open source text-recognition (Tesseract from Google) managed to link 70 % of the images directly to a person.
But this was not enough. We also wanted to geolocate as many individuals as possible.
From the Norwegian Warsailors association we were given permission to use pictures and the stories of each ship that was sunk during the war. We combined these data with information from the excellent site Warsailors in order to complete our map. German submarine captains neatly logged the position of the ships they sunk. This information wasn’t available to the authors of “Våre falne” in 1949, but it is now. Most of this was added through geolocation in Mapbox and scripts to extract the positioning from plain text. Almost all the red dots in the Atlantic Ocean on our map are actual locations, not just approximations.
One huge problem still remained, which programming could not solve. Our biographies were machine-read text, and appeared sloppy. Some of the issues we could solve with scripting (like changing every instance of iBBi to 1881), but it was clear for us we have to proof-read all of the 11,724 biographies.
At the same time, we took into account that some people would be missing from the lists, so we had to add a feature for that. Last but not least, 30 % of our profile-pictures were not connected to a person. Partly because of people having the same name (e.g. seven men called Ole Olsen), partly because of our machine-reading was not fool-proof, and partly because of typos in our default data.
To make the job as easy as possible, we created an extensive toolbox for proof-reading, editing and creation of new profiles.
When we published the feature in the morning of 9th April 2015, it immediately got a lot of attention. It’s one of the most read features in VG’s history, with 1.3 million unique visitors (out of a population of 5 million people). 2.7 million profiles have been shown, and brought a lot of unknown Norwegian war history into the light:
The feedback has been almost entirely positive. Our readers have embraced the feature and helped us complete the database. We have made more than 2000 additions and changes based on reader feedback.
The database now consists of 11,893 Norwegians who perished as a direct cause of the Second World War. The complete dataset will be given back to the Genealogy Society of Norway and open sourced.