Data! Data! Data! I can’t make bricks without clay!
– Sir Arthur Conan Doyle
Like Sherlock Holmes, human rights defenders rely on data to draw conclusions for their advocacy work. In this spirit, ShareLab and Tactical Tech organised the Data Investigation Camp to share skills, tools and methodologies, and build a community able to critically reflect the usage and impact of data.
Around 65 people with diverse backgrounds, amongst them journalists, activists, lawyers, data analysts and visualisers, researchers and security experts, gathered in an old monastery in the Montenegrin Bay of Kotor.
This blog post gives a glimpse of what we have learned and the discussions that came up. We will walk through the main steps in a data project and give some pointers to helpful tools or methodologies.
A shared passion for data is what brought us together, but the level of interaction and the way of approaching a new project differs from person to person. Often it starts with an idea, a question or an issue to investigate or to collect evidence for. This requires data. The type of data varies ( i.e. text, images, videos, maps, statistics, sensor data) as does the way of obtaining, ranging from interviewing people, searching through archives and libraries, to crawling online resources like government pages, newspapers, social media, etc.
Often researchers and activists rely on tech staff to collect online data. However, tools such as Littlefork and Postman provide a simple way to obtain structured information from Google, Twitter, Youtube, etc. and to interact with APIs.
Depending on the data source and the project goal, verification is crucial to ensure the correctness and validity of the analysis and demonstrate trustworthiness to the public. The EXIF file of an image (which you can find using right click → Properties → Image on Ubuntu) reveals when, where and with what device a picture was taken. Also, Google Reverse Image Search, the YouTube Data Viewer from Amnesty International and Check give insights into the original online publication date and source.
After verification, the data is cleaned and converted into a process-able format. Python and R are popular programming languages for data analysis and lots of tutorials on how to do a basic exploratory analysis are available, including Panda’s Python Tutorial and R Data guide.
Visualisations help to understand the data and make results more accessible and interesting to a wider public. RawGraph provides an intuitive framework to turn spreadsheets into colorful graphs. As an open-source tool for network and graph visualization, Gephi was introduced. But don’t forget: Having beautiful graphs is nice, but the content is what actually counts.
Beyond basic data analysis, researchers and human rights activists want to understand complex systems, identify main actors and their network of interaction, track supply chains, uncover hidden patterns, and even predict future outcomes or behaviour.
Machine learning methods offer great possibilities to support these goals and to reduce manual effort. However, they also pose new challenges as human bias in data is reinforced and the decision making process is difficult to understand.
For any data project it is important to keep the bigger picture in mind. This includes data awareness: how is the data gathered, what is capture and, more importantly, what is missed, and how is this data influenced by human bias or a machine based selection?
Besides the limitations of data, a critical reflection of the possible harmful impacts of a project and its products, as well as the ethical questions that may rise are important.
Another main topic affecting all stages of a project is security – physical and digital security. Especially for small organizations being aware of security threats and keeping track of all possible measures to combat them, can be quite overwhelming. However, with a detailed risk analysis that includes: the context an organisation is working in, the data they are dealing with, the institutions they are opposing, and the consequences of data loss, a clearer picture of the actual security threats they are facing can be drawn. So, the necessary security measures are narrowed down and are specifically adapted to the needs of an organisation.
Demystifying technology and tools
Within the five days of the data investigation camp, we covered a wide range of topics and created an excellent atmosphere to share skills, have discussion and build a community. Personally, I learned a lot, especially from the practical sessions that not only enabled us to play around with new tools, but also show that with interest and a little bit of help we can learn to use them – no matter how scary and non-intuitive they look at first sight.
Overcoming the fear of technology, demystifying the complexity of methods and building a community to exchange knowledge is an important value that empowers researchers, activists, as well as developers and data analysts.
It was such a pleasure to be part of this diverse group of people, grasp a tiny piece of their expertise, experience their passion for their work, and get inspired by their ideas and ways of thinking.
Even though this blog post was all about data, tools, and the opportunities that come with it, let’s keep in mind that data is not the answer to all of our problems…
Not everything that can be counted counts,
and not everything that counts can be counted.
– William Bruce Cameron