The prompt injection examples in this post are fun, but this remains a big issue:
The fundamental problem here is this: Large Language Models are gullible. […] This is a hard problem to solve, because we need them to stay gullible. They’re useful because they follow our instructions. Trying to differentiate between ‘good’ instructions and ‘bad’ instructions is a very hard—currently intractable—problem.