www treadstone71.com
Standard Operating Procedure
Problem → Evidence → Scenarios → Attack → Consensus
1. Purpose and mindset
Analysts need a repeatable method that turns messy questions into clear, defensible judgments. The Problem → Evidence → Scenarios → Attack → Consensus method creates that structure. The method forces explicit assumptions, visible logic, and measurable uncertainty instead of intuition wrapped in confident language. Teams treat reasoning as a craft with standards, not as a personal style.
The method supports estimative intelligence, forecasting, and any assessment that influences significant decisions under uncertainty. Every phase expects analysts to write plainly, link claims to evidence, and expose logic to challenge. Strong teams treat the SOP as a discipline, not as a formality.
2. Scope of use
Analysts follow this SOP when they:
- Forecast events, behaviors, or outcomes.
- Assess courses of action for friendly or hostile actors.
- Prepare material for senior decision makers who need clear odds and clear drivers.
Single analysts follow the full method in compact form. Large cells map roles across multiple people and AI agents, while keeping one lead analyst responsible for coherence and standards.
3. Roles and responsibilities
Every run through the method needs clear owners.
Lead analysts steer the work. They define the question with the decision owner, guard tradecraft, keep the group on time, and control scope.
Decision owners set the question, the decision that depends on the answer, the risk posture, and any constraints on collection or publication.
Decomposer roles break the main question into sub-questions and assumptions.
Evidence roles search, triage, and tag sources. They track provenance, bias, contradictions, and relevance to the question.
Scenario roles build futures and assign probabilities tied to evidence and assumptions.
Adversary roles attack reasoning, question evidence, and push for alternative explanations.
Calibration roles track probability discipline, forecast scores, and long-term performance.
Scribe roles maintain logs, decisions, and version history so others can audit the reasoning later.
Small teams blend roles into one person where needed, while keeping each responsibility visible in writing.
4. Phase 0 – Intake and framing
Every strong product starts with a sharp question. Analysts begin by recording the request in a task log with a timestamp, sponsor, and decision link.
The main question must describe:
- Actor or actors.
- Action or outcome.
- Place or domain.
- Time horizon.
- Conditions that count as a resolved outcome.
Analysts then write a resolution rule in plain language. That rule describes the specific observable fact pattern that closes the question as true or false. For example, analysts may write: “Question resolves as correct if country X conducts a publicly acknowledged missile strike against target Y before 30 June 2026.”
Decision context follows. Analysts describe what decision the forecast supports, what risk tolerance the sponsor shows, and what failure hurts more: overestimating the danger or underestimating it. Time and resource limits join the entry.
The phase ends when the question, resolution rule, and decision link appear in one short intake record that any teammate can read without confusion.
5. Phase 1 – Problem decomposition
Complex questions hide many moving parts. Analysts drag those parts into daylight.
The team first lists assumptions. Assumptions cover political factors, economic constraints, social pressures, technology, legal conditions, military capacity, narrative drivers, and environmental factors. Each assumption receives a short label and a one-sentence description.
Analysts then group assumptions into clusters: drivers, constraints, actor intent, capability, timing factors, and third-party influences. Each cluster spawns one or more sub-questions. Every sub-question links back to at least one assumption and addresses a single dimension that analysts can answer with evidence.
For each sub-question, analysts define completion criteria. Those criteria state:
- Which types of evidence answer the question in a meaningful way.
- What minimum confidence level satisfies the task.
- Which sources carry the highest value, if available.
Dependencies between sub-questions receive explicit mapping. For example, a question about strike timing may depend on a question about logistics capacity. Analysts keep dependency chains short where possible, since long chains hide compounding error.
The team tracks assumption volatility from the start. When new evidence forces a change in an assumption, analysts update the record so the team sees how the logic base shifts over time.
Phase 1 ends when the team holds:
- A clear assumption register.
- A list of sub-questions with completion criteria.
- A dependency map that connects sub-questions to each other and to the main question.
6. Phase 2 – Evidence marshalling
Evidence work turns vague thinking into grounded judgment. Analysts use the sub-questions and completion criteria to drive collection.
For each sub-question, evidence roles search for sources that match the needed types: primary documents, direct statements, sensor data, financial records, open reporting, technical telemetry, or structured intelligence from trusted partners.
Analysts tag every source with:
- Type (primary, secondary, tertiary).
- Origin (open, internal, partner, sensor, human report).
- Provenance score on a simple scale that the team understands.
- Known bias or alignment where relevant.
- Date of publication and time of underlying events.
Analysts then extract structured evidence from each source. Extraction includes short quotations for direct language, paraphrased facts, discrete data points, and any quantitative series relevant to the question. Each extracted item keeps a reference to the source.
Contradictions receive special treatment. Whenever two sources clash on an important fact, analysts record that conflict in a contradiction log with links to both items. High-impact contradictions trigger extra collection, re-contact with human reporters where possible, or technical checks on data integrity.
Evidence work continues until each major sub-question holds enough diverse, relevant, and traceable sources to support judgments at the required confidence level, or until time and collection limits stop further progress. Analysts record any gaps that remain.
7. Phase 3 – Reasoned claims and local judgments
Claims translate evidence into statements that decision makers can use. Analysts write claim sets for each sub-question.
Every claim:
- Uses a single clear sentence.
- Addresses one idea.
- States a position that evidence can support or refute.
Analysts attach explicit evidence references to each claim. Strong practice links more than one independent source to any high-impact claim. The record then notes the strongest piece of counter-evidence and explains why the claim still stands in light of that challenge.
Each claim receives a quantitative confidence value. Teams often map plain-language labels to numerical ranges, for example:
- Low confidence: 0.2 to 0.45
- Moderate confidence: 0.45 to 0.7
- High confidence: 0.7 to 0.9
Analysts pick a number within the band and write a brief explanation for that choice. That explanation points to evidence strength, source diversity, and contradiction handling.
Logic notes for each claim spell out the reasoning path in a few sentences. Analysts show how assumptions, evidence, and prior judgments link together. Claims with weak logic or thin evidence receive revision or removal. No major claim survives without explicit support.
Phase 3 ends when every important sub-question holds a small set of well-sourced, numerically scored claims with documented logic and visible counter-arguments.
8. Phase 4 – Scenario simulation
Scenarios turn scattered claims into coherent futures. Analysts begin by listing main drivers and main uncertainties that emerged from the claims. Drivers may include leadership intent, alliance cohesion, resource constraints, technological maturity, public mood, or adversary counteraction.
The team constructs three to six scenarios that share the same time horizon and scope as the main question. Each scenario represents a distinct outcome pattern. Scenarios must not overlap in outcome definition, yet together they must cover the plausible space around the question.
For each scenario, analysts write:
- A short name that captures the essence.
- A concise narrative that explains how events unfold.
- A mapping back to specific assumptions and claims that support that path.
Analysts then assign probabilities to each scenario. The sum across the set equals 1. The team bases those probabilities on the relative strength of supporting claims, the number and severity of challenges raised so far, and any base-rate data from similar past cases.
Sensitivity checks follow. The team changes one assumption at a time within reasonable bounds and observes how scenario probabilities shift. A robust scenario set does not collapse or flip wildly when analysts make small, plausible changes. Large shifts signal fragile logic or overconfidence.
Phase 4 ends when the team holds a scenario set with clear narratives, traceable links to evidence and assumptions, and a probability distribution that survives basic stress tests.
9. Phase 5 – Adversarial red-teaming
Adversarial review strengthens reasoning before reality does. A person or team in the adversary role reviews the work with a mandate to break the argument, not to defend it.
Red-team reviewers attack:
- Hidden assumptions that never received explicit naming.
- Overreliance on single sources or narrow source types.
- Logical jumps that skip intermediate reasoning steps.
- Scenario gaps where plausible futures never appear.
- Signs of groupthink or motivated reasoning.
Reviewers use structured questions such as:
- “What evidence would reverse this judgment tomorrow?”
- “Which actor benefits if our current narrative spreads unchecked?”
- “Where do we treat desire or fear as fact?”
For every challenge, the main team writes a response. Analysts may gather more evidence, adjust claims, alter probabilities, or acknowledge unresolved risk. Any changes in probabilities receive logging with a short explanation of the trigger.
Unresolved high-impact uncertainties receive a special list. That list enters the final product so decision makers see where the ground still feels shaky and what observations hold the power to shift judgments later.
Phase 5 ends when adversarial reviewers run out of substantial attacks that the evidence and logic cannot address without major structural change, or when time runs out and the team records outstanding concerns.
10. Phase 6 – Consensus and product
Consensus in this method never means uniform agreement. Consensus means transparent pooling of informed judgments with a record of dissent.
Analysts gather all probability estimates for the main question and for the scenarios. Those estimates come from human analysts and from calibrated AI helpers when available. The team then applies a pooling rule, such as an equal-weight average or a performance-weighted scheme based on past forecast accuracy. The rule remains written and consistent across projects.
The lead analyst writes the BLUF. That section names:
- The main judgment stated as a forecast or assessment.
- The numerical probability or range.
- The time horizon.
- One or two strongest drivers that support the judgment.
Supporting sections explain the logic path in clear stages. The product walks the reader from drivers and assumptions, through evidence and claims, into scenarios, and finally into the pooled forecast. Language stays concrete and avoids jargon. Every major statement points to annexed material that holds the underlying detail.
The product also lists:
- Main uncertainties that still influence the outcome.
- Potential alternative scenarios with lower probability that still deserve monitoring.
- Indicators that, if observed, should push the judgment up or down.
Annexes contain the full assumption register, sub-questions and completion criteria, evidence tables, contradiction logs, claim sheets, scenario narratives, and the red-team challenge log.
Phase 6 ends when the product reads cleanly for a senior non-specialist reader while still allowing a technical reader to audit every major step through the annexes.
11. Phase 7 – Review, resolution, and learning
Learning gives the method long-term value. After the forecast horizon ends or the event resolves, the team returns to the question.
Analysts apply the resolution rule from Phase 0 to the observed facts and record whether the forecasted outcome occurred. The team then scores the original probabilities using a proper scoring rule such as the Brier score. Scenario forecasts receive similar treatment where possible.
An after-action review follows. The team studies:
- Assumptions that failed or held.
- Sources that misled, including patterns of bias or deception.
- Points where group dynamics suppressed dissent.
- Steps in the method that the team rushed or skipped.
Analysts then update guidance, training material, and internal examples. Strong cases enter a library that new analysts study later. Poor cases enter the same library with honest commentary.
Over time, calibration roles track trends in forecast accuracy, common reasoning errors, and improvement after each cycle of refinement. The SOP itself evolves slowly as teams discover better ways to express standards and embed checks.
12. Discipline, metrics, and daily habits
Methods only help when teams treat them as daily habits. Analysts strengthen discipline through simple metrics:
- Number of explicit assumptions per major product.
- Percentage of major claims with more than one independent source.
- Frequency of recorded contradictions and resolutions.
- Stability of scenario probabilities under sensitivity tests.
- Forecast scores over rolling windows.
Leaders review those metrics during regular tradecraft sessions. Younger analysts learn that strong reasoning grows from explicit structure, not from confident tone. Senior analysts model humble language, transparent doubt, and firm resistance to political pressure on numbers.
The Problem → Evidence → Scenarios → Attack → Consensus method then becomes more than a diagram. The method becomes a shared language for thinking under uncertainty, arguing well, and learning from error in a systematic way.
