Backdoor Attacks

Safety checks usually look for the model doing something strange. That sounds reasonable. A compromised model should produce a strange phrase, repeat a suspicious payload, ignore the image, or behave in a way that feels obviously detached from the input. This is the comforting version of AI security: attackers leave fingerprints, defenders look for fingerprints, and everyone goes home after filling out a procurement checklist. ...