Struggles of machine learning practitioners in the human rights field

10 posts / 0 new
Last post
Struggles of machine learning practitioners in the human rights field

Below is a list of questions to serve as a starting framework for the discussion in this thread:

  • How to bridge the gap between ML practitioners and human rights defenders?
  • How to give the user control over the ML algorithm without overextending them?
  • How to get more machine learning practitioner to work actively in the field of human rights?
practitioners community

As most machine learning practicioners in the field of human rights work in small teams, I am very interested in how you keep up with advancements, new methods and literature. Who do you reach out to with questions regarding the applicability of algorithms or their implementation and parameter settings? Do you collaborate with (other) university research groups?

One think that I see is that

Comment originally posted by Enrique Piracés

One think that I see is that there is not a large group of clearly defined roles and project aroun ML and HR. I may be wrong, but besides a few projects that come from organizations working on data or organizations that experimenting or prototyping with ML, it seems the use of ML on HR is in its infancy (which I see as an opportunity). From the ecperience working with a few computer scientists, what I have seen is that there is a big gap in understanding ML by HR practitioners, as well as a big gap in understading HR practice by ML experts/enthusiasts. One thing I have seen as a succesful (even if difficult to scale up) is too explain not big future problems or ideal uses of ML but rather explain cumbersome, onerous aspects of human rights work that could potentially be made more fficient with ML. Just a thought for now.

struggles of ML practitioners

Comment originally posted by Natalie Widmann

I totally agree with you, Enrique! There is this mutual gap in understanding which we are just able to overcome if we have an open and diverse dialogue, but also long term projects in which machine learning practitioners, human rights activists, as well as lawyers, software developers and UX designers work closely together to create a mutual understanding.
Will, you wrote a blog post on the first steps of integrating machine learning into Ushahidi ( ) Which problems (and solutions) did you encounter?


Issues with Data

Comment originally posted by Bill Doran

One of the major issues we encountered was the rarity of data, the issues around anotation and the sparsity of some parts of data. For Ushahidi, speficially a lot of the data that would have been interesting was in relatively sparse format(primarily SMS or twitter). Very short data blobs that are not particularly succinct didn't do very well in classification. Most of those faired better in entity extraction and annotation/semantic linking to DBpedia.

We've started to work to integrate the HDX HXL format so that our users can add tags and attributes to the data to help with annotation and then export the data to HDX itself to make it - where appropriate - publicly available.

A major concern that I have is the upper limits that we're going to face - it's not practical to take a mechanical automaton model of having people annotate large corpus. Even if we have that large scale data there is significant processing power required in building models from large datasets. It seems, I think, one route to attack the problem lies in some form of transfer learning but this is still likely to be highly domain specific and would require a lot more fundamental research.

Are there any academics here who are working with large, complex data? How did feature extraction working in terms of text and semantics? What tuning was undertaken? Is representational learning practical? It still seems there is a gap between what is accessible now in terms of supervised learning and what will hopefully become available in thenear future interms of semi-supervised learning.


Issues with Data

Comment originally posted by Natalie Widmann

yeeesss! Small and unbalanced training data is a fundamental problem when training machine learning algorithms, especially in a new domain.

Regarding text classification we have good experiences with the Tensorflow Universal Sentence Encoder ( also Based on a sentence encoder the similarity of two texts can be computed. In your case it could be either directly used for classification or to increase the number of labeled messages by identifying similar content from other resources...

Another idea (not sure if it fits your purpose) is to use active learning such that only sentences which a low prediction probability (the ones which very likely will be misclassified) are passed to human annotators. This leads to a faster adaptation of a classifier and reduces the manual labeling process.



Comment originally posted by Vivian Ng

The Fairness, Accountability and Transparency in Machine Learning community has a lot of resources and work in this area (, which I think is a good springboard for thinking about the intersection of machine learning and human rights.The language of fairness, accountability and transparency resonates with some fundamental elements within the human rights framework and this connects to the broader conversation of how design and development processes of machine learning systems can and should be compatible with human rights. We already see crossovers in terms of human rights experts working with or advising developers, and also various civil society organisations bringing onboard experts in machine learning, AI etc.

connecting NGOs to university groups/students

Comment originally posted by Natalie Widmann

I think an important step to overcome the communication and knowledge gap between human rights defenders and machine learning practitioners is to have an ongoing dialogue. Universities play an important part here.

I would love to strengthen the connection between university groups and NGO's in order to give students the opportunity to apply their skills to relevant problems apart from industry and academia, and to show human rights defenders the possibilities with new technologies.

Of course, this requires resources to prepare useful, but also feasible projects, to provide clean data and also, to integrate students into the workflow and way of thinking of human rights organisations. Has anyone experience with such student projects/internships or an idea on how to implement them, e.g. having a pool of interested NGOs already with some sample projects and a pool of students from suitable university programs to match? Who would be interested in a pilot project? :)



Comment originally posted by Vivian Ng

Have you seen the Mozilla program, Natalie?



Comment originally posted by Natalie Widmann

that's interesting! Thanks for sharing, Vivian.