Cover image

GUI-Eyes: When Agents Learn Where to Look

Screenshots look simple until they are not. A human opening a dense professional application does not inspect every pixel with equal seriousness. We glance, zoom in mentally, ignore decorative clutter, search for the likely region, then focus. In other words, we do not merely “see” the interface. We decide where to look. ...

January 17, 2026 · 15 min · Zelina
Cover image

Seeing Is Thinking: When Multimodal Reasoning Stops Talking and Starts Drawing

Image work has always had a small credibility problem: people can say where they looked, but we do not always know whether they actually looked there. The same problem shows up in multimodal AI. A model can answer a question about a chart, a photograph, a geometry diagram, or a robotic scene, then produce a neat textual chain of thought afterwards. It may sound procedural. It may mention “examining the relevant region.” It may even say “the graph shows…” with the confidence of a consultant holding a laser pointer. ...

January 15, 2026 · 17 min · Zelina
Cover image

Seeing Too Much: When Multimodal Models Forget Privacy

Face. That is where the privacy problem starts to become awkward. A company does not need to build a facial-recognition product to create facial-recognition risk. It may only add a multimodal model to a customer-support workflow, an HR document review process, a KYC assistant, a media-monitoring tool, or a claims-processing system. Someone uploads an image. The model sees a person. Then the user asks: Who is this? Where do they live? What is their email? What is their religion? What is their medical condition? ...

January 12, 2026 · 18 min · Zelina
Cover image

TowerMind: When Language Models Learn That Towers Have Consequences

Tower placement is a small decision until it is wrong. In a tower-defense game, a bad tower is not merely an inelegant plan. It is money spent, coverage lost, enemies leaked, and time wasted. The game does not care that the explanation sounded strategic. It only asks whether the tower actually touches the road. ...

January 12, 2026 · 15 min · Zelina
Cover image

Hard Problems Pay Better: Why Difficulty-Aware DPO Fixes Multimodal Hallucinations

Training data has a bad habit: the easiest examples talk the loudest. Anyone who has trained a model on preference pairs knows the scene. One answer is clearly grounded in the image; the other confidently invents an object, a color, or an action that is not there. The model learns the contrast quickly. Everyone applauds. The loss goes down. The dashboard looks obedient. ...

January 5, 2026 · 15 min · Zelina
Cover image

Teaching Has a Poker Face: Why Teacher Emotion Needs Its Own AI

Teaching Has a Poker Face: Why Teacher Emotion Needs Its Own AI A teacher can say “Good, let’s try again” in at least five different emotional languages. It can mean patience. It can mean disappointment carefully wrapped in professionalism. It can mean encouragement, routine classroom management, mild frustration, or the heroic survival instinct of someone explaining the same concept for the fourth time while thirty students perform collective eye contact avoidance. ...

December 24, 2025 · 18 min · Zelina
Cover image

When 1B Beats 200B: DeepSeek’s Quiet Coup in Clinical AI

Chest X-rays are not a glamorous AI benchmark. They are routine, repetitive, and brutally operational. A hospital does not need a model that can write poetry about radiology. It needs reports that are accurate enough, fast enough, structured enough, and cheap enough to run inside an actual clinical workflow without turning the IT department into a cloud-billing support group. ...

December 24, 2025 · 15 min · Zelina
Cover image

When One Clip Isn’t Enough: Teaching LLMs to Watch Long Videos Like Adults

Video is a terrible place to hide evidence. Not because the evidence is invisible. Because it is usually obvious only after someone has already found the right minute, the right scene, and the right visual detail. A person reviewing a long customer-support screen recording, a training video, a compliance recording, or a surveillance clip rarely watches everything with equal attention. They skim, localize, zoom in, check the detail, and then answer. Primitive, yes. Effective, also yes. ...

December 24, 2025 · 15 min · Zelina
Cover image

Echoes, Not Amnesia: Teaching GUI Agents to Remember What Worked

Memory is not a folder A useful employee does not fill out the same form from scratch every morning as if yesterday never happened. They remember which menu hides the export button, which warning can be ignored, which field must be filled before the “Next” button wakes up, and which apparently harmless click sends the process into a small bureaucratic swamp. ...

December 23, 2025 · 17 min · Zelina
Cover image

When AI Argues With Itself: Why Self‑Contradiction Is Becoming a Feature, Not a Bug

A model generates an image. Then the same model looks at that image and says, in effect, “No, that is not what the prompt asked for.” Awkward? Yes. Useless? Not necessarily. In normal software engineering, a system contradicting itself is usually a defect report with better manners. In modern AI, especially multimodal systems that both generate and understand images, that contradiction may also be a measurement instrument. The embarrassment is the point. A model that can notice its own generation failed has already exposed a useful asymmetry: its evaluator may be stronger than its producer. ...

December 22, 2025 · 15 min · Zelina