Skip to content
View DerOeko's full-sized avatar

Highlights

  • Pro

Block or report DerOeko

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
DerOeko/README.md

Hi there!

My academic journey started with a fascination for the brain, leading me through psychology and Philosophy & Cognitive Science. However, I increasingly felt that the ideas and beliefs I gained “don’t pay rent”; they didn't seem to relate to stuff I actually care about. This ultimately led me to studying AI in-depth at Radboud University.

After conducting research on computational models for depression with Roshan Cools at the Donders Institute, I became increasingly motivated by preventing catastrophic outcomes of superintelligent AI through engagement with EA communities. Since January 2025, I've dedicated myself to Mechanistic Interpretability, specifically detecting deception in neural networks, as well as other empirical AI alignment research agendas. My biggest worry is that models might learn different internal goals and learn to hide these with superhuman capability through alignment faking or steganographic reasoning.

I was an ARENA Fellow in the 2025 iteration, building technical skills and working on a project on LLM reasoning from my AI Safety Camp project (supervised by Nandi Schoots).

Now I am researching automated red-teaming for safety evaluation with Tal Kachman's lab. I’m particularly focused on understanding how LLM-to-LLM interactions differ from human-LLM interactions in a multipolar AI world: Do models exploit linguistic quirks when interacting with each other? Do our current safety evaluation tools adequately replicate real deployment conditions? See my research statement for more information on my research interests (however, a bit outdated, without mention of jailbreaking).

Wanna talk? Feel free to book a 1-on-1. Always happy to chat. Or the old fashioned way: samuelgerrit.nellessen{at}gmail.com.

trophy

Pinned Loading

  1. ChemControl ChemControl Public

    GitHub Repository for the MCC lab ChemControl project

    MATLAB