Skip to content

A white paper writing for my "Debates in Social Data Science" course at the Central European University. The paper argues that the current alignment approach, not restricting models during training, then externally aligned post-training to prevent unethical action, is not an adequate approach - instead, we need to build models understanding ethics.

Notifications You must be signed in to change notification settings

me9hanics/White-Paper---AIs-with-Internal-Ethical-Understanding

Repository files navigation

White paper: "Ethical AIs Need to Understand Society" - AIs with internal understanding of ethics for proper alignment

A white paper writing for my "Debates in Social Data Science" course at the Central European University - March 2024.

The paper argues that the current alignment approach, not restricting models during training, then externally aligned post-training to prevent unethical action, is inadequate due to infeasibility; "...currently, this does help, but the correct way to fix the problems is not by countersteering towards a better direction, but making the models go in the right direction in the first place".
Instead, we need to invent architectures that learn an internal understanding (representation, if you like) of ethics and social behavior. Such models (without the need for external tools in the system) could generate more trustworthy results and be capable of repelling potential attacks, by recognizing potential harmful behavior.

Such a system that does not make systematic errors cannot be achieved with our current methods, even with post-training correction tools - if we were to build such a system, we need to make the model learn to understand what is good, what is bad, and develop it to have an internal sense of ethics. Thus, we need fundamentally new methods; those that learn and adjust like a human would, and are trained not only to be intelligent based on intelligence benchmarks, but trained to be socially intelligent as well.


Policy suggestion: "A recommendation is turning focus on models that learn to act accordingly from the beginning, particularly models that have potential to learn communicational, social skills and build an internal understanding of social behaviour. Incentives are missing now, therefore policy makers have to ensure that more research goes into social AI."

The white paper is stored in the repository as a PDF file, and an earlier presentation is also available.

About

A white paper writing for my "Debates in Social Data Science" course at the Central European University. The paper argues that the current alignment approach, not restricting models during training, then externally aligned post-training to prevent unethical action, is not an adequate approach - instead, we need to build models understanding ethics.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published