Prompt Injection, Model Merging, and Job Stealing Microscopes
✋ This post uses images, so if you are reading this on an email client, enable the “display images from sender” to get the most out of the post. If you are a new subscriber, welcome! If you find these writings valuable, consider sharing them with a colleague or giving a shoutout on Twitter.
If you missed the previous post, you might want to check out the story section — What happens when Transformers are All we need? — where I talk about the consequences of Transformers becoming the common fabric of modeling.
Git Re-basin, a paper that’s taking Twitter by storm because it promises to merge two differently trained models without fine-tuning or pre-training and without taking much hit to the loss.
Why is this exciting? This has applications in distributed training, federated learning, and privacy-preserving learning, and other areas. If you have the budget to read only one paper this week, make it this one.
Prompt injection is like SQL injection, but for prompts. Unlike SQL injection, simple escaping methods don’t seem to work well. Simon Willison wrote a series of nice explainer posts on prompt injection issues that were recently discovered by Riley Goodside. After spending a big chunk of my career on many content related cybersecurity issues like spam, phishing, deepfakes, and misinformation, I disagree with Simon’s conclusion —. This is like saying more software cannot solve software issues. I would rephrase it as “AI alone cannot solve AI issues”, as I say in this Seattle Times op-ed on misinformation.
Of course, technology alone will not help … we will require a multipronged solution involving technology, education and policy. Educators .. are training young minds to be more discerning about what they read online. Countries such as Germany are considering legal options ... Above all, journalists and fact-checkers are working tirelessly to prevent the truth from being stifled or distorted. I am hopeful that, working together, we can build a society cemented in truth.
OpenAI released Whisper, an open source speech recognition toolkit, marking a departure from their previous closed releases like GPT-3 and DALL-E. Kudos to the team, not just for open sourcing, but also in achieving what appears to be spectacular results on benchmark datasets. However, if my experience in speech has taught me anything, it is to not get too excited by WER (word error rates) on benchmarks as they can differ widely on private datasets. With all that abundant caution, I still think this is exciting. Particularly, Whisper’s improvement over NVIDIA’s CTC large model and a variety of industry models (Companies A through D in figure below; Cloud API terms for these companies prohibit benchmarking).
Whisper is multilingual and trained on 680K hours of data, but non-English languages get around 1000 hours each. So the preformance, as expected, varies wildly depending on the language. An interesting fact is Whisper achieves this performance without tricks like Dropout, data augmentation (like SpecAugment), and stochastic depths. This suggests most of the gains are attributable to the increased size of the dataset. I wish they had more disclosure on the dataset (or even release the dataset), so the community can replicate, and understand the biases and other issues coming from these models.
🥛 Water Cooler
In other open source news, PyTorch is now under the Linux Foundation umbrella as PyTorch Foundation. This is important because PyTorch, the most popular framework among researchers, is setting itself apart from the other big frameworks by becoming independent of its founding organization (META). At some point, ML frameworks and libs might become a fundamental part of the OS that we might see it bundled along with the OS like devtools, libc, and the compiler.
The Follower sounds like a movie name, but it isn’t, and it could very well be the plot of one. Belgium-based artist, Dries Depoorter, triangulates open camera feeds with Instagram posts to build a dystopian art demo. What’s worrisome is not that he was able to do it, but that it’s incredibly easy for anyone to replicate these. Now I can no longer post my fake-vacations selfies from Bora Bora.
Nature published an article with a clickbait title (surprise!) — Microscopes are coming for your job. In this case, the language is much deserved. Microscopes are getting upgraded, like how cars are getting (or trying) updated towards autonomy. Gone will be the days of a trained technician fiddling with the knobs to locate the right kind (“phenotype”) of cells and manually count them. These futuristic microscopes will have operators that will simply place the prepared sample, provide an example cell figure (say from a textbook), and say, “find me all occurrences of cells of this phenotype”. The microscope using a combination of segmentation algorithms, image embedding, and retrieval methods, brings up all occurrences of those cells.
This is a big deal because now lab experiments can be shrunk from days to hours, the ensuing cost reduction and the cost reduction by using less qualified operators than technicians. At the limit, it is possible for end customers to own some of these units and do bloodwork at home. Much like autonomous cars, these devices will have their own levels of automation. I really like the last sentence in the paper:
“Microscopes may indeed be coming for your job — but they’re also giving you a promotion.”
This is very much the thesis of my AI automation taxonomy.
Thanks for reading AI Research & Strategy! Subscribe for free to receive new posts and support my work.