Advanced Topics: Ethics in AI and Data Science


To understand the contemporary ethical issues in the use of AI and Data Science tools and provide tools and techniques to analyse and address them

Learning outcomes

– To know the main ethical concerns involved in designing and using AI and data-centred systems
– To understand the (algorithmic) solutions proposed to design ethical concerns into AI and data analytics algorithms
– To analyse the consequences of ethical decision in AI and autonomous decision-making systems
– To analyse and implement these decisions in practice


1. Ethics for Subsymbolic AI

1.1. Privacy

– Privacy Concerns
– Differential Privacy

1.2. Fairness and bias

– Concepts and definitions of fairness
– Impossibility results, Pareto optimal solutions

1.3. Explainability

– Causal approaches to explainability

2. Ethics for Symbolic AI

2.1. Understanding and reasoning about human behaviour

– Game theoretic approaches
– User-centred design


For all of the assessment modules below, The students form groups of two. The assessment criteria are judged individually when possible. Hence, members of the same group may receive different final marks.

1. Presentations 30%

Each group gives four presentations on four different topics, spread throughout the term (two presentations by each group member, both will contribute to each and every presentation and should answer all questions). Each presentation should cover at least one key paper of the respective subject matter.  Each presentation will account for 7.5% of the total mark.

Judgment criteria (2.5% each):


Was the presentation well organised?
Was it given at the right level of detail (informative but without being unnecessarily complex)? Was it clear that the student thought about the topics and contributed to the final conclusions?


Did the student speak fluently and clearly, without excessive recourse to notes? Did he/she use the right level of technical language (sufficiently technical but without excess of acronyms)? Did he/she respond well to questions?


Did the student make good use of available technologies (e.g., for slides or tool demos)?
Are the slides well designed?
Do slides support the logical flow of the presentation?
Was the presentation prepared to a professional standard?

2. Term papers 50%

Each group writes four term papers on the topics of their presentations. The term papers should not only reflect the content covered in their own presentation, but also in all other presentations and hence will cover multiple papers.  Each term paper should be around 2500 words (this is just a guideline not a hard rule). Each term paper will account for 12.5% of the total mark.

Judgment criteria:

Logical Structure (2.5%):

Structuring the paper into sections and the sections into paragraphs with a logical flow of information. Proper use of itemised or enumerated lists, figures, and tables. Including an abstract, and proper introduction and conclusions. 

Presentation (2.5%):

Using proper headings for sections. Figures and tables should have appropriate captions and should be referred to in the text.
Typeset properly in LaTex; you could use an online LaTex environment such as Overleaf and use one of its Homework Templates. No spelling or grammar mistakes

Content (5%):

In-depth treatment of the chosen topic: identifying the main message of the paper,  concrete problem definition, providing clear definitions, motivated by illustrative examples, identifying concrete results and exemplifying them.  Critical appraisal of the results.

References (2.5%):

Using scientific references (textbooks, papers appearing in peer-reviewed journals and conferences; avoid referring to Wikipedia; do not use too many references to popular websites). Using a consistent and complete citation and bibliographic style (e.g., any one of the following: Harvard, Chicago, IEEE)

3. Mini-project  20%

Pleae choose one of the two mini-projects specified below (on explainability and bias).  The final deadline and presentation is on Saturday 27th and Monday 29th of November.



Main book: Michael Kearns and Aaron Roth. The Ethical Algorithm. Oxford University Press. 2020.
Other books:
● Michael Anderson and Susan Leigh Anderson. Machine Ethics. Cambridge University Press, 2011.
● Mark Coeckelbergh, AI Ethics, MIT Press, 2020.
● Cynthia Dwork and Aaron Roth. The Algorithmic Foundations of Differential Privacy. NOW Publishers, 2014.
● David Edmonds. Would You Kill the Fat Man? Princeton. 2014.
● Shannon Vallor. Technology and the Virtues. Oxford University Press, 2016.

Similar courses:



Further reading / tools (for your term paper and project)


– C. Dwork. Differential privacy. In Proceedings of the International Colloquium on Automata, Languages and Programming (ICALP)(2), pages1–12. 2006.
– Cynthia Dwork, and Kobbi Nissim 2004. Privacy-Preserving Datamining on Vertically Partitioned Databases. In Advances in Cryptology – CRYPTO 2004, 24th Annual International CryptologyConference, Santa Barbara, California, USA, August 15-19, 2004, Proceedings (pp. 528–544). Springer.
– C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. J. Priv. Confidentiality, 7(3), p.17–51. 2016.
– C. Dwork and M. Naor. On the difficulties of disclosure prevention in sta-tistical databases or the case for differential privacy.Journal of Privacy and Confidentiality, 2010.
– I. Mironov. On significance of the least significant bits for differential privacy. In T. Yu, G. Danezis, and V. D. Gligor, editors,Association forComputing Machinery Conference on Computer and Communications Security, pages 650–661. Association for Computing Machinery, 2012.
– Anindya De. Lower bounds in differential privacy. In Theory of Cryptog-raphy, pages 321–338, 2012.
– Irit Dinur, and Kobbi Nissim 2003. Revealing information while preserving privacy. In Proceedings of the Twenty-Second ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 9-12, 2003, San Diego, CA, USA (pp. 202–210). ACM.
– S. Prasad Kasiviswanathan, H. K. Lee, K. Nissim, S. Raskhodnikova, and A.D. Smith 2011. What Can We Learn Privately?. SIAM J. Comput., 40(3), p.793–826.
– Ilya Mironov 2012. On significance of the least significant bits for differential privacy. In the ACM Conference on Computer and Communications Security, CCS’12, Raleigh, NC, USA, October 16-18, 2012 (pp. 650–661). ACM.
– Martín Abadi, Andy Chu, Ian J. Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar, & Li Zhang (2016). Deep Learning with Differential Privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, October 24-28, 2016 (pp. 308–318). ACM.
– Cynthia Dwork, & Guy N. Rothblum (2016). Concentrated Differential Privacy. CoRR, abs/1603.01887.

There is a repository on github called Awesome-Privacy that categorized some of the most notable studies in this field:

Fairness / bias

– Aylin Caliskan, Joanna J. Bryson, Arvind Narayanan. Semantics derived automatically from language corpora contain human-like biases
– Tolga Bolukbasi, Kai-Wei Chang, James Zou, Venkatesh Saligrama, Adam Kalai. Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings
– Michael Kearns, Seth Neel, Aaron Roth, Zhiwei Steven Wu. Preventing Fairness Gerrymandering: Auditing and Learning for Subgroup Fairness
– Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, Rich Zemel. Fairness Through Awareness.
– Tolga Bolukbasi, Kai-Wei Chang, James Y. Zou, Venkatesh Saligrama, & Adam Tauman Kalai (2016). Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5-10, 2016, Barcelona, Spain (pp. 4349–4357).
– Dwork, C., Hardt, M., Pitassi, T., Reingold, O., & Zemel, R. S. (2012). Fairness through awareness. In S. Goldwasser (Ed.), Innovations in Theoretical Computer Science 2012, Cambridge, MA, USA, January 8-10, 2012 (pp. 214–226). ACM.
– Jon M. Kleinberg, Sendhil Mullainathan, & Manish Raghavan (2017). Inherent Trade-Offs in the Fair Determination of Risk Scores. In 8th Innovations in Theoretical Computer Science Conference, ITCS 2017, January 9-11, 2017, Berkeley, CA, USA (pp. 43:1–43:23). Schloss Dagstuhl – Leibniz-Zentrum für Informatik.
– Michael J. Kearns, Seth Neel, Aaron Roth, & Zhiwei Steven Wu (2019). An Empirical Study of Rich Subgroup Fairness for Machine Learning. In Proceedings of the Conference on Fairness, Accountability, and Transparency, FAT* 2019, Atlanta, GA, USA, January 29-31, 2019 (pp. 100–109). ACM.
– Alexandra Chouldechova (2017). Fair Prediction with Disparate Impact: A Study of Bias in Recidivism Prediction Instruments. Big Data, 5(2), 153–163.
– Aylin Caliskan Islam, Joanna J. Bryson, & Arvind Narayanan (2016). Semantics derived automatically from language corpora necessarily contain human biases. CoRR, abs/1608.07187.
– Gonen, H., & Goldberg, Y. (2019). Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them. In A. Axelrod, D. Yang, R. Cunha, S. Shaikh, & Z. Waseem (Eds.), Proceedings of the 2019 Workshop on Widening NLP@ACL 2019, Florence, Italy, July 28, 2019 (pp. 60–63). Association for Computational Linguistics.
– Preethi Lahoti, Alex Beutel, Jilin Chen, Kang Lee, Flavien Prost, Nithum Thain, Xuezhi Wang, & Ed H. Chi (2020). Fairness without Demographics through Adversarially Reweighted Learning. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual.
– David Madras, Elliot Creager, Toniann Pitassi, & Richard S. Zemel (2018). Learning Adversarially Fair and Transferable Representations. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018 (pp. 3381–3390). PMLR.
– Brian Hu Zhang, Blake Lemoine, & Margaret Mitchell (2018). Mitigating Unwanted Biases with Adversarial Learning. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, AIES 2018, New Orleans, LA, USA, February 02-03, 2018 (pp. 335–340). ACM.

Two GitHub repositories that categorise articles by subject, venue, and publication date:


– Quoc V. Le, Marc’Aurelio Ranzato, Rajat Monga, Matthieu Devin, Kai Chen, Greg S. Corrado, Jeff Dean, Andrew Y. Ng. Building high-level features using large scale unsupervised learning.
– Yanai Elazar, Shauli Ravfogel, Alon Jacovi, and Yoav Goldberg 2021. Amnesic Probing: Behavioral Explanation With Amnesic Counterfactuals. Trans. Assoc. Comput. Linguistics, 9, p.160–175.
– Michael Hind et al. Teaching AI to Explain its Decisions. Proc. of AIES 2019.
– Karthik S. Gurumoorthy et al. Efficient Data Representation by Selecting Prototypes with Importance Weights. Proc. ICDM 2019.
– Marco T. Ribeiro, S. Singh, and C. Guestrin. Why Should I Trust You? Explaining the Predictions of Any Classifier. Proc. KDD 2016.
– Scott Lundberg, Sun-In Lee, A Unified Approach to Interpreting Model Predictions. Proc. NIPS 2017.

Algorithmic Game Theory

– D. Gale and L. S. Shapley. College Admissions and the Stability of Marriage.

Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio. Generative Adversarial Networks.
– Michael J. Kearns, Mallesh M. Pai, Aaron Roth, & Jonathan R. Ullman (2014). Mechanism design in large games: incentives and privacy. In Innovations in Theoretical Computer Science, ITCS’14, Princeton, NJ, USA, January 12-14, 2014 (pp. 403–410). ACM.
– Justin Hsu, Aaron Roth, & Jonathan R. Ullman (2013). Differential privacy for the analyst via private equilibrium computation. In Symposium on Theory of Computing Conference, STOC’13, Palo Alto, CA, USA, June 1-4, 2013 (pp. 341–350). ACM.
– D. Gale, & L. S. Shapley (2013). College Admissions and the Stability of Marriage. Am. Math. Mon., 120(5), 386–391.
– Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron C. Courville, & Yoshua Bengio (2014). Generative Adversarial Nets. In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada (pp. 2672–2680).
Beaulieu-Jones, B., Wu, Z., Williams, C., & Greene, C. (2017). Privacy-preserving generative deep neural networks support clinical data sharing. bioRxiv.

User-centred design tools:


Other resources:

Living with AI Podcasts
Verifiability YouTube Channel
Moral Machine

Project definitions

Project #1 (Explainability)

Shapley additive explanations (SHAP) is a local post-hoc explanation method in
Explainable AI (XAI) that measures the contribution of different input parameters
to the AI’s ultimate choice. One of the known limitations of this method is that it
is necessary to calculate the contribution of each feature to the final prediction
in all possible permutations of the feature set, which is computationally
expensive and it has an computational compexity ( is the number of

In this project, you use the COMPAS, a feature-rich dataset, to estimate the risk
factor of an individual getting detained following their release from jail and to
train an AI model for this purpose. Use IBM AIX360, an open-source toolkit that
supports numerous AI explainability methods, to explain your model’s behavior
using the various ways that are implemented in this toolkit, particularly the SHAP
approach. Then, develop and implement a solution to boost SHAP’s computing
speed using the IBM AIX360 toolset. Compare your proposed method to the
initial implementation of SHAP.

COMPAS Dataset (
IBM AIX360 Toolkit (
Lundberg et al., A Unified Approach to Interpreting Model Predictions

Project #2 (NLP Fairness)
In the realm of deep learning, proposing unsupervised solutions to problems
that were previously addressed through supervised methods could represent a
significant advancement. Creating models that are resistant to spurious
correlations caused by a defect in the annotation pipeline, for example, is an
active area of research in natural language processing. Many studies have
proposed supervised solutions for this problem, whereas Mahabadi et al. have
proposed an end-to-end solution that is comparable to supervised solutions in
terms of performance.

In a variety of tasks, such as occupation prediction, it has been demonstrated
that large, pre-trained language models exhibit human-like bias against women.
In this project, you will utilize the BiosBias dataset, which is designed for
occupation prediction tasks based on an individual’s short biography. Your
objective is to first implement an occupation predictor based on an individual’s
biography using PLMs, then propose and implement a solution, similar to
Mahabadi et al.’s solution, to mitigate bias in your predictor model, and finally
compare your final model to your vanilla model.

End-to-End Bias Mitigation by Modelling Biases in Corpora


Many of the resources on this page are prepared by Mahdi Zakizadeh.