Research Question | BizLinkPartners 社内ドキュメント

Which problem-space decomposition methods — qualitative problem-structuring methods, statistical/computational techniques (clustering, topic modeling, decision trees), and LLM-hybrid approaches — can detect "multiple independent decisions mixed in one draft" and MECE violations (gaps/overlaps) in a single short decision document, and how do they compare on single-document applicability, detection coverage, implementation cost, and run-to-run reproducibility?

[Note: Focus on methods applicable to ONE short document (not a large corpus). NOT about document clustering at scale. The goal is to AUGMENT an existing MECE-checkpoint review, not replace it.]

Context (≈350 words)

We run a review pipeline for Architecture Decision Records (ADRs); solo operator. Each submission is a single Japanese draft of roughly 200–2,000 characters following the chain Context → Problem → Problem-points → Tasks → Actions.
Two partitioning criteria are in place today. (a) Four MECE checkpoints derived from Barbara Minto's Pyramid Principle, one per arrow of the chain: does the problem really arise from this context (are unrelated decision chunks mixed in); are problem-points exhaustive, non-overlapping, and of the same kind; do tasks stop recurrence; are there orphan actions. These are semantic judgments. (b) A governance-attribute divergence test, separate in origin from Minto: if independent reversibility, decision drivers, approver, or rollback-trigger/verification-KPI diverge between parts of a draft, suspect two decisions ("1 ADR = 1 decision").
Known weaknesses: free-form "find the flaw" prompting is unstable across runs; the governance test is hard to apply when raw drafts do not state those attributes explicitly. We are designing an LLM gate node and an author-side self-check, and want to know what third family of methods exists beyond (a) and (b).
Candidate techniques we specifically want evaluated: cluster analysis, topic models (LDA/BERTopic), decision trees, sentence-embedding clustering, graph community detection — plus qualitative problem-structuring methods (issue/logic trees, KJ-method affinity diagramming, IBIS/argument mapping, Soft Systems Methodology, morphological analysis, Cynefin).
Constraints: input is one short Japanese text; the method must run as one node in an LLM pipeline (Cloudflare Workers + LiteLLM) or as a cheap author-side procedure; verdicts must be reproducible run-to-run and explainable to the author with evidence quotes ("why is this judged as two decisions"), because false positives cost author rework.

Questions

Qualitative problem-structuring methods: which of issue/logic trees, KJ-method affinity diagramming, IBIS/argument mapping, Soft Systems Methodology, morphological analysis, Cynefin, or others transfer to inspecting a single short decision document? For each transferable method, state what it detects (chunk mixing / gaps / overlaps) and the minimal procedure.
Statistical/computational techniques: can cluster analysis, topic modeling (LDA/BERTopic), decision trees, sentence-embedding clustering, or graph community detection operate on ONE short document (e.g., sentence- or claim-level units)? State minimum data requirements and known short-text adaptations; if a technique fundamentally requires a corpus, say so explicitly.
LLM-hybrid methods: what designs (semantic decomposition into claim/argument graphs with structured output, embedding-similarity separability tests between candidate chunks, argument mining) achieve measurable run-to-run stability and explainable evidence? Cite reported reliability or agreement numbers where available.
Prior art: what tools, papers, or industrial practices machine-assist granularity checking for ADR/RFC/design documents ("1 ADR = 1 decision", decision splitting, scope linting)?

Output

Executive summary (3-5 key findings)
Per-question analysis
Comparison matrix: method × (single-short-document applicability / detects mixing-gaps-overlaps / implementation cost / reproducibility / explainability)
A recommended augmentation stack layered on top of the existing two criteria (MECE checkpoints + governance-attribute divergence test), with must-have / should-have / nice-to-have priority
References with URLs