The LLM revolution has forced computer science educators into a false dichotomy: either abandon takehome assessments entirely in favor of draconian in-class paper exams, or ignore the dissolution of academic integrity while students farm AI for solutions1.

Both approaches miss the point entirely.

Instead of retreating to monotonous memorization or surrendering to algorithmic automation, we should be asking: What kinds of problems are genuinely engaging to solve? The kind that students want to work through, that teach real skills, and that resist the temptation to outsource creative thinking to a chatbot.

This spring, I helped redesign Columbia’s required Fundamentals of Computer Systems course (“Fundies”) around this principle. The result? A takehome exam that students actually enjoyed—and learned from 🤯

The Problem with Traditional Assessments

For years, the second midterm in Fundies followed a predictable pattern: students would implement standard C library functions like strcmp in MIPS assembly. Write a forloop. Debug a switch-case. Test basic syntax knowledge.

But what’s the point? If the goal is assessing understanding of the hardware-software contract and low-level programming concepts, does writing a MIPS forloop really accomplish that? More importantly, would any student describe debugging assembly syntax as fun?

An LLM renders these exercises trivial anyway. Rather than fight this reality or pretend it doesn’t exist, we decided to design around it.

A Different Approach: Reverse Engineering as Assessment

Instead of asking students to write assembly, we gave them a MIPS binary and challenged them to reverse engineer it. Their task: find hidden flags by understanding not just assembly syntax, but the deeper concepts that make low-level programming powerful and dangerous.

midterm instructions

The exam became an interactive “capture-the-flag” challenge. Students had to:

  • Analyze assembly code to understand program behavior
  • Exploit stack overflow vulnerabilities to access hidden flags
  • Navigate mixed endianness challenges
  • Watch their exploits subvert the very calling conventions they’d learned in class

To support ~500 students without installation headaches, we forked UNSW’s WebAssembly-based MIPS simulator2. Students could interact with memory, observe stack overflows in real-time, and see security vulnerabilities play out—all in their browser.

midterm instructions

Could students use LLMs? Absolutely. The difference was that generic AI solutions wouldn’t carry the whole assignment. Students needed to understand the specific binary, craft precise exploits, and think critically about low-level system behavior. We added randomness (literally an RNG system call) to make each run of the code demand a unique solution. The assessment required genuine engagement with the material and analysis of the simulator’s state, whether or not they used AI tools.

Student Reception: “This was actually fun”

The response surprised even me. Instead of the usual post-exam hate mail3, we received emails and forum posts celebrating the exam’s novelty and—crucially—how much fun students had solving it.

midterm feedback

Students weren’t just tolerating the assessment; they were genuinely excited about what they’d learned. Some stayed up late not because they had to, but because they wanted to see if they could crack the next flag.

The biggest complaint? We included stealth bonus questions that some students found unfair—a far cry from complaints about irrelevant busywork or impossible memorization tasks.

midterm negative feedback

The Real Lesson

LLMs haven’t killed engaging computer science education—they’ve revealed how boring our assignments had become. When students can get AI to write their forloops, maybe the problem isn’t the AI; maybe the problem is that we were asking students…to write forloops.

The solution isn’t to retreat to paper-and-pencil exams or throw up our hands in defeat. It’s to design assessments that are inherently engaging, that teach skills students actually need, and that resist the kind of surface-level pattern matching that makes AI assistance feel like cheating rather than collaboration.

Students want to be challenged. They want to learn things that matter. And yes, they want their coursework to be fun. We just need to be creative enough to give that to them.

Feel free to reach out for the code, solutions, or other resources I built recently for systems programming classes.

Footnotes

  1. I’ve witnessed this firsthand through extensive TA experiences and heard countless anecdotes about the devolution of office hours, students spending time tricking AI into providing full solutions despite prompt restrictions, etc. Meanwhile, professors paint a perfect picture for publications. Even Anderson Cooper seemed blissfully fine with this trajectory on 60 Minutes, failing to raise counterpoints to Sal Khan’s optimistic take.

  2. Of course, we needed to make many edits to fix bugs and support new extensions (system calls, new instructions, etc.) to ensure a smooth experience for students. Nevertheless, by nature of the stack overflows they were enabling, the simulator was prone to crashing—after all, the students were intentionally breaking it as the assignment required!

  3. As a TA, I’ve accidentally written some of the lengthiest and most challenging midterms for our hardware class in college—some take-home, some in-class—and weathered the complaints for years. Sorry to my former students.