Juicy Research Ideas and How to Find them?
How do people come up with research ideas in AI? Will the "AI Scientist" finally make me work full-time on my chicken farm?
This fall I will be mentoring a few UPenn students on research projects. I mentor PhD students and founders every year, but this year I am doing that with Chris Callison-Burch in the context of his CIS 8000 course. This course is aimed at introducing senior undergraduates and master's students at Penn to the craft of AI research, from idea generation to publishing. I wish such opportunities had existed when I was in undgrad or grad school.
Yesterday, we discussed where people get ideas for AI research. The common answers from the student audience were:
Reading other papers
Listening to talks
Scrolling Twitter
Looking at GitHub
Talking to experts and friends
and so on
These are not bad, good even, but I realized all of AI research is highly predictable1 and we could do a better job of characterizing where AI research ideas come from. In fact, it's so predictable that we are now imagining LLMs doing AI research for us.
When Sakana AI came up with its "AI Scientist" splash, not many could see it for what it was, but instead repeated the company's carefully curated talking points about the research2. The only exception I've seen is this tweet thread from Jimmy Koppel that dove deeper into the generated papers and concluded:
The Sakana AI Scientist is a masterwork of PR3. They present it: a groundbreaking system that ideates and performs ML research -- and yes, of course it has some flaws and limitations and this is just early work, but it's still a great achievement. But all that's to stop you from looking too closely at what they actually do. Because I don't think there's much there at all.
Before you dismiss the thread as a cynical take, do me a favor and read the Twitter thread and this post in its entirety, then comment on this article.4
Are good AI research assistants or even "scientists" a pipe dream then? Absolutely not. We are on the precipice of building high-quality AI assistants for almost every knowledge profession you can think of. These assistants will be qualitatively superior to the glorified autocompletes of today. I will talk about that another day, but today let's focus on AI Scientists.
So how does one go about building a better AI Scientist? For the first edition of this question, I will consider idea generation. How do humans produce AI research ideas, especially ideas that result in publication?5
A Catalogue of AI Research Idea Generators
From at least a thousand AI research papers I have read, skimmed, or glanced at during my career, I am offering a preliminary catalogue of AI research idea generators. Each of these is a tried and tested formula/recipe for generating AI research papers:
Invent a hammer and find nails to whack: This is straightforward and the most common approach that newcomers to research think of when they think what “doing research is”. Usually, this involves finding a technique or proposing an architecture, and applying it to multiple datasets to show improvements. Examples from the recent past6 include the transformer paper and the chain-of-thought paper.
Find hammers to whack on your nail: Here you invent a task and provide a rationale for why it should be solved and how it is useful for society. Then you painstakingly build a small but high-quality dataset for it, try out different models and techniques on that dataset, and declare a winner. Many new tasks proposed at ACL tend to be like this. Example: See the FEVER paper.
Create a horse race: This is a scaled-up version of #2. Usually, you try out recipe #2 to pilot a task and gauge community interest. If it works, you create a "shared task" (think of it as a Kaggle competition) with a much larger dataset and invite the community to participate. In NLP, TREC, CoNLL, WMT, and Semeval competitions are prototypical examples of these. An example from computer vision is ImageNet. It is very common for some of these tasks to have a multi-year half-life, but most saturate quickly due to the progress in hammer production. In the post-LLM era, "benchmark papers" are also examples of these horse races.
Climb that hill: This recipe is for the engineers among us. Take an existing idea and make it faster, cheaper, smaller, or better. For example, the original Transformer paper introduced attention computation with a quadratic cost. Then came a slew of papers that reduced the time complexity of the attention algorithm to subquadratic regimes.
Play matchmaker: Take something you know well from your community and apply it in another community. Before neural networks were a thing, we would routinely see physicists take ideas from statistical mechanics and introduce them to the ML community. These papers are amazing. Long ago, way before AlphaFold existed, I attended a talk by the awesome Julia Hockenmaier who blew my mind by showing how NLP parsing techniques could be applied to protein folding. In the post-LLM era, invoking this recipe is quite common.
Introduce constraints: Humans, when put in constrained situations, become highly creative. You can use this to create new recipes by taking a well-known problem setting and adding a constraint. Suddenly you have a harder version of the problem demanding novel solutions. Examples of constraints include solving a problem with a tiny training dataset (see papers on low-resource NLP) and making models run on small devices (see work by Tim Dettmers).
Expand the scope: Here you take a well-known task and expand its scope. Make it multilingual. Make it general (limited vocabulary speech recognition vs. open-vocabulary speech recognition).
Mind the gap: Often when you are invested in a problem, you end up reading most papers on that problem, and in the process you might realize there is a blind spot or a "gap" in the literature. These gaps arise for several reasons. Perhaps authors of the previous works made unstated assumptions about something? Maybe an important initial/boundary condition was not examined? In the ACL anthology alone, searching for the phrase "gap in the literature" results in more than 1000 hits. If, for example, all parsing papers you looked at were for left-to-right languages, then clearly the community has a gap in the parsing literature for Semitic languages.
Court the outliers: When empirical results are reported, outliers are usually brushed off or Huberized. This is because, during evaluation, we are interested in measuring how good our model of the world is and we want those measurements to be reliable (smooth). Complementary to evaluation is the dreaded and often missed step — by novice practitioners — of error analysis, where you are required to categorize prediction/generation mistakes and explain them. Sometimes some of the "mistakes" are telling a completely different story. For certain stop sign images in the ImageNet dataset, misclassifications occur despite being "easy" and clearly visible. It turns out some of them had graffiti on them that confused the ConvNets. Instead of discarding these as bad image inputs, you could look at these from a different perspective: What if stop signs with graffiti are adversarial inputs for image classifiers? Can I put a post-it labeled "chair" on a desk and confuse the classifier? You now have the adversarial attacks paper!
Customer-centric work: We rarely see these kinds of ideas in academic conferences, but this happens a lot in product building. You build a product and release it into the wild. Observe how your users are struggling and identify aspects of research that could come out of this.
Become an old wine sommelier: Read a lot of old papers and repackage them as new papers. This might appear like fraud, but it might be the most valuable thing you could do for the community. A lot of old ideas are forgotten or abandoned because people who championed them no longer exist in the field, the ideas did not work well with the hardware of that time, and so on. You have a unique opportunity to resuscitate these ideas, and who knows, you might start something like the deep learning revolution!
If we are building AI research assistants, we might want to encode these strategies in the prompts for idea generation. Perhaps we might have a shot at building something useful.
All ideas in this post are about AI research. It is possible that some of these are applicable to other areas of CS or research, in general, but I am not qualified to make that claim.
The one thing I dislike about Twitter/X is how once something catches momentum, via clickbait or legit means, it becomes impossible to insert alternative ideas, forcing people to think beyond clean & linear narratives. I had these thoughts earlier, but if I did not wait until the cooling off period of the “AI Scientist” press cycle, my post would feed the algorithm to surface more “AI Scientist” BS to people.
As if on cue, AI Scientist appeared in research-industry’s favorite PR outlet — Nature News — today; unsurprisingly without any depth and hence I am not linking it here.
Also, the “AI Scientist” in the heading and the naming of the proof-of-concept is a misnomer. It implies all that a “scientist” does is read and ideate. That is further from the truth for empirical sciences, and 99.9% of AI.
A lot of valuable ideas are not publishable because of academic publishing works. The emphasis here on publication-worthiness is simply a proxy for acceptance by a community of peers with a somwhat unreliable crietria called the review process.
I debated whether to share canocial examples from longer past — 70s-2000s — here but after some consideration, I decided to illustrate with examples from the recent years to be more accessible to younger readers. An undergrad who wanted to intern for me sincerely believed AI started with AlexNet. Go figure!
Thanks a lot! Starting my PhD now, is a great resource for me and my friends!
What a great post! I certainly wish I had this kind of systematic approach to thinking about research ideas when I was doing my Ph.D. Would have helped tremendously.