Continuous reporting in Senate Expenses from Argentina, has turned almost 37.000 difficult to access scanned image PDFs into a live source of stories that is keeping Senate accountable not only by journalism but by citizens. First using more than 6 OCR engines, and then working in open collaboration with more than 500 volunteers, including NGOs and Universities who were invited by LA NACION data team to work in “civic marathons”. In two months our volunteers helped convert 6700, three years Senate expenses into a structured comprehensible dataset. We could achieve this, and front pages quality reporting from this dataset, thanks to a platform developed by our Knight-Mozilla Opennews fellows that we named “Vozdata ” (which comes from giving voice to the data) that was inspired by Propublica´s “Free the Files” and The Guardian “MP´s Expenses” project.
After finding out that the Senate have published expenses renditions since 2004 in raw PDFs, some of them as images and completely unstructured, LA NACION data team managed to scrape, transform, normalize and structure three datasets into one and began an interrogation process that included front page stories, replies from actual and former Argentina´s vice presidents (Senate presidents), and provoked a judicial investigation over vicepresident Amado Boudou regarding these expenses. This series of front page stories lead to more stories and different approaches to keep Senate accountable.
As we converted these PDFs into OCR txt files using more than 6 different OCR engines, we realized that we had lots of information lost as a consecuence of very noisy PDF´s. Besides, we realized that there would be more stories if more eyes helped us classify and enter this data. So we decided to ask for help: inspired in The Guardian MP´s Expenses and Propublica´s Free the Files, we asked our Knight-Mozilla Opennews Fellow 2013 Manuel Aristaran to help us develop “Vozdata” a platform for Crowdsourcing data in a structured way.
He developed the platform named “Crowdata” working together with Gabriela Rodriguez our Opennews Fellow 2014 and in june 2014 we Open Sourced this platform.
We launched our first “Senate Expenses Vozdata project with a dataset of more than 6700 PDFs that took two months to process. To fulfill that, again we asked for open collaboration and activated our community organizing two “Civic Marathons” with NGO´s , Universities and users that volunteered.
At the same time, one of our journalists heard that in Senate there had been a big growth in the amount of employees, and as our data team had been scraping during 30 months (since November 2011) the lists of Senate permanent, temporary and contracted employees, we could release a unique and original analysis that became a new finding, sustained with data and visualizations. In this period senate employees and contracted went from 3.700 to 5.700 which meant a 55% growth. Again, our vice president Amado Boudou replied to these articles using the official channel in national TV, but he could not deny any of the numbers on the reporting.
Thanks to building this dataset from scratch and analysing dates, we also found out that some official trips expenses were presented with dates that were overlapped and they even included some trips that were not made.
In the analysis and reporting stage we worked with three journalists from the Politics section: Laura Serra, Ivan Ruiz and Maia Jastreblansky. All these stories came out from the same dataset and are following up last year´s stories (this project and stories won the 2013 Data Journalism Awards).
So keeping the dataset up to date for us is like keeping a live source talking to us, and the new stories are again, from Vice president´s expenses and from other senators expenses.
Besides a home page for all this series of stories, some of this year´s impact stories with data were:
We decided to build a home page integrating a Data-topical-TAG in our CMS so we could gather and present all Senate expenses stories extracted from this dataset and the impact of the investigation in Justice as well.
Senate expenses project is part of our strategy to bring data to life and help journalism and citizens go through details and stories hidden in data. We think citizen collaboration for knowledge extraction or for keeping governments accountable is just starting, and by showing citizen volunteers how their collaboration leads to a transformation in how politicians spend or in terms of transparency is crucial for media to gain credibility that will lead to more open collaboration.
LA NACION asked for the renditions of expenses of Congressmen and of City of Buenos Aires Legislature too, we are now expecting an answer. In the meantime we are going to open Vozdata II, Senate Expenses until 2014, to follow up with this project.
In a country without FOIA and ranked 106 from 175 in the Corruption Perceptions Index, LA NACION believes that media must be proactive and open data as an example to promote a change towards transparency and innovation. Open data and share it, and report with the best quality journalism on that data.