OpenWatch: Building an AI-Powered Security Intelligence Platform for Nigeria
On a typical day, OpenWatch reads about 200 Nigerian news articles, identifies roughly 30 real security incidents, and decides those 30 are actually 12.
The hardest part of building it wasn't any of the AI. It was teaching the system that “unknown gunmen attacked” is the same incident no matter which newspaper wrote about it first.
Why this matters
Nigeria has 36 states, dozens of news outlets, and no single source of truth for what's actually happening on the ground. Journalists, researchers, and security professionals piece together the picture by hand: opening five tabs, cross-referencing names, guessing whether the bandit attack on Tuesday is the same incident as the cattle raid on Wednesday.
I built OpenWatch to do that work automatically: ingest the news, extract the incidents, deduplicate them semantically, and publish a real-time intelligence map with daily briefs.
What I assumed
When I started, I assumed this would be a model problem. Better LLM, better prompts, ship faster.
I was wrong. It was a domain problem all the way down. Every difficulty I ran into (false positives, weird taxonomy edges, deduplication failures, low-confidence extractions) turned out to be a question I couldn't answer myself, never mind a model. What counts as an “incident”? When is a celebrity recount different from a primary report? When are two outlets covering the same attack versus two different ones?
You can't prompt your way out of questions the domain itself hasn't decided.
The unknown gunmen problem
The clearest example is the phrase “unknown gunmen”. In Nigerian crime reporting it's everywhere. It describes an armed attack where the assailants weren't identified or didn't claim responsibility, which is most of them.
A naive prompt sees “unknown gunmen” and tries to classify it as a specific group. A naive dedup pipeline sees it twice and can't tell if it's the same attack or two different ones.
The fix wasn't a smarter model. It was a system prompt that knew Nigerian geography (all 36 states + FCT), distinguished Nigerian English from Pidgin, enforced a strict taxonomy of 12 incident types (terrorism, kidnapping, banditry, armed robbery, communal violence, cattle rustling, and so on), and included explicit rules to reject things that look like incidents but aren't: celebrity recounts of past events, court proceedings about old cases, aftermath stories framed in the present tense.
Classification accuracy comes from domain specificity, not model size.
Embeddings as a deduplication engine
Most projects use vector embeddings for retrieval. OpenWatch uses them for deduplication.
Five outlets cover the same attack. Each one frames it slightly differently: different fatality count, slightly different location, different tone, different time of day. Traditional dedup (string matching, keyword overlap) fails immediately. An LLM “are these the same?” check is too slow and too expensive at scale.
The fix: embed every extracted incident, then use pgvector's cosine distance to find semantically similar records inside a 72-hour window. Combine that with geographic proximity and temporal closeness into a multi-signal fusion score. Above the threshold, merge them and link the new source to the existing incident, which improves its confidence over time. Below the threshold, treat them as separate.
The reframe was using embeddings as the spine of an operational system, not as a feature.
Confidence as a publishing gate
Every incident gets a composite confidence score: source tier (Tier A outlets like Punch and Premium Times score higher than unverified social media), number of corroborating sources, extraction clarity, location specificity. The dashboard publishes anything above the threshold automatically. Below it, the incident sits in a review queue.
This is the part nobody asks about and that took the longest to tune. Set the threshold too low and the map fills up with noise nobody trusts. Set it too high and breaking news sits invisible for hours.
The right answer wasn't a number. It was a feedback loop. Confidence updates as new sources arrive, so an incident that started below the bar can crawl above it as more outlets report. The publish gate isn't a moment. It's a sliding decision.
The stack
The takeaway
The foundational technical work (the pipeline, the embeddings, the real-time subscriptions, the geocoding) turned out to be the easy part. The hard part was the intelligence layer on top: the questions I had to answer about Nigeria's security situation before I could ask a model to answer them.
Building a domain-specific AI system means becoming a junior expert in the domain first. Skip that step and you'll ship a product that's impressive in a demo and wrong in production.
What’s next
The next step for OpenWatch is moving from real-time intelligence to predictive intelligence, using the same pipeline to flag pattern shifts before they make the news. I'm curious how much of that needs new infrastructure and how much is just better questions.