Custom AI chatbots are quietly becoming the next big thing in fandomA search for tweets about Character.AI will quickly reveal that some young, vulnerable users are trying to get their favorite characters to coach them into self-harming or giving them rules for their eating disorders. This is certainly one of the outcomes that Character.AI hopes to use its filters to avoid — given that its claimed goal is to “give everyone on earth access to their own deeply personalized superintelligence that helps them live their best lives.” (Character.AI’s representative did not respond to requests for comment about how they plan to improve their filters to prevent these types of uses.)
The filter is imperfect — in our tests, an AI asked explicit self-harm questions might type out a fully visible violent answer before the filter kicks in and hides it, for example. And users are endlessly creative, as experiments like ChatGPT’s “Do Anything Now” hacked mode show, so it is always possible that the bots will be put to dangerous uses.
It seems like these companies are constantly pouring months or years into building huge models that they purport to be sophisticated, then “making them safe” with a simple filters they spend a couple days on.
I guess you can say the same with Twitter and YouTube building ways to spread content at massive scale, then slapping on some afterthought safety measures that don’t do much.