the past 3 days have been the first time I’ve been hit with the “You’re right! [insert sycophantic rant]” nonsense, which I believe is because its interpretation of my system instructions is giving it more flexibility in how it’s responding. This was coupled with it pulling out emoji on me for the first time in months
I haven’t been trying to hard to pull any jailbreaks with regards to the classic examples (being mean, make me meth, etc) but GPT-5 is still extremely willing to write malicious code for you with even the slightest hint of “for a defensive research goal…”
as per usual API calls seem to have almost 0 moderation of this nature, especially for older/cheaper models. I was able to directly replicate some Russian malware that uses AI to generate the payload for funsies, using their system prompts, and it straight up works
the “agentic” mode stuff and scratchpad/memory they keep adding on seems like they’re working to keep users inside their own walled garden, which makes me wonder which is more profitable for them - getting 1,000x Joe and Jane Schmoes to have a $20 a month subscription they forget about/only use 2-3 times a day, or selling API keys and tokens to companies who will pay through the nose but also will require their own data center to keep up the fire hose. Honestly probably the biggest fish is the 500 a month users who are coding up a storm with it/cursor