Detection engineering for lean teams
Most organizations don’t suffer from a lack of alerts. They suffer from a lack of useful alerts that someone actually has time to investigate. This article looks at how to design a small, durable signal set that reflects real attacker behavior and the realities of limited people, time, and tooling. The objective is not maximal coverage on paper. It’s a signal set your team can own, tune, and repeatedly act on without burning out.
Why fewer, better signals is a survival strategy
A good signal doesn’t just say “something happened.” It tells a human what kind of decision is now on the table: escalate, contain, investigate, or ignore. A lean team with a few engineers and no 24/7 SOC must assume they will ignore most low-value alerts. Detection engineering must start from that reality, not fight it. The real success metric is how often a fired detection meaningfully changes what your team does next, not how full the SIEM looks.
Detection under real-world constraints
Before you design or tune any rules, you need an honest view of the environment and the team that will live with them. Where does you risk really live?
- Are you primarily exposed through M365, Workspace, and SaaS?
- Do on-prem AD and legacy Windows systems still matter a lot?
- Which systems would hurt most if quietly compromised for 60–90 days?
Who actually works the alerts?
- Is there a dedicated SOC, or is incident response “a hat” for admins?
- How many hours per week can realistically go to alert triage?
- Who has authority to contain accounts, devices, or apps quickly?
What’s in your telemetry stack?
- Do you have centralized logs (SIEM, XDR) or mostly console views?
- How far back can you look (retention) when investigating?
- Which platforms already raise “good enough” signals natively?
What does a “good signal” look like?
Given only the alert payload and linked context, a human can decide: “Do we ignore, watch, escalate, or contain?” If they can’t, the signal needs more context or should be downgraded to background telemetry. The signal tracks attacker objectives and constraints, not obscure internal implementation details. It should still matter if tooling, port numbers, or endpoints change slightly. Lean teams can’t babysit brittle rules. The best signals are composed from relatively stable fields and behaviors (identity changes, app registrations, rare admin operations) rather than fragile one-off patterns.
Examples
New high-risk app consent + atypical user
- Trigger when an OAuth app is granted broad mail, file, or directory scopes.
- Prioritize when the requester is not a known admin / automation owner.
- Investigation: verify legitimacy of the app, owner, and intended use.
Mailbox rule or forwarding change + external recipient
- Trigger on new rules that auto-forward, delete, or hide messages.
- Highlight rules targeting external domains or shadow folders.
- Investigation: confirm rule owner, recent sign-in, and business purpose.
Abnormal remote admin behavior
- Trigger when a rarely-used account suddenly uses RDP/PsExec/WinRM broadly.
- Combine with failed logon patterns and unusual time-of-day activity.
- Investigation: pivot into device logs, recent changes, and credential use.
A simple detection engineering loop for lean teams
1. Start from scenarios, not rules
Identify 5–10 scenarios that keep you up at night: business email compromise, cloud admin abuse, lateral movement into a key system, etc. Design signals to answer “would we notice this?” rather than starting from generic rule content.
2. Prototype queries first
Build searches in your SIEM/XDR first. See how often patterns occur and what context you can attach. Only promote to “official signal” when you’re confident the noise level and workflows are acceptable.
3. Attach a mini-playbook
For each promoted signal, keep a 1–2 page runbook: what the alert means, what to check first, where to find relevant logs, and when to escalate. Store it somewhere discoverable.
4. Hold short “signal reviews”
Once a month, review firing patterns and outcomes. Retire alerts that never lead to decisions. Adjust thresholds and context where analysts struggled to interpret results.
If you can only invest in a few signals
1. Identity risk events
Focus on new global/tenant admin assignments, high-risk sign-ins, and high-impact app consents. These few events often represent large changes in attacker leverage.
2. Mailbox & data exfil red flags
Auto-forwarding, rules that hide or delete, and abnormal sharing changes in storage (SharePoint/OneDrive/Drive) are high-yield signals of data theft and long-term access.
3. Remote admin usage anomalies
Catch accounts that suddenly begin administering devices in ways that don’t match their history, especially when combined with failed logons and odd timing.
4. “Canary” accounts or resources
Use dedicated, non-production identities, mailboxes, or shares as tripwires. Any access to them is inherently suspicious, which simplifies detection logic.



