SHT. 01 OF 04 SCALE 1:1
DWG. CC-2026-001 PERSONAL PROFILE
mostly human

Casey
Capshaw

// husband · father · crafter · thinker · conflicted technophile

We're in the midst of the biggest disruption for humanity since the industrial revolution. We're back to the drawing board in nearly all aspects of society. Contact me for projects rethinking small business organization, enhancing human performance, and the way we will work together and find meaning in an AI-centric world.

ai workflows + org design
human dynamics
pragmatic vision
first principles thinking
colorado
SECTION A
caseycapshaw.com REV. A — 2026
00 — AI DEVELOPMENTS
00 / LIVE FEED — REFRESHED DAILY

AI Developments

TODAY IN AI 2026-04-20

Four blog posts from OpenAI this week teach managers, finance teams, and individual users how to prompt better, build reusable workflows, and personalize their ChatGPT experience. Then there's the Hyatt deployment, rolling out ChatGPT Enterprise across its entire global workforce using GPT-5.4 and Codex. That's not experimentation anymore. That's infrastructure.

Meanwhile the research side is asking sharper questions about whether the tools actually hold up. The Amazing Agent Race benchmark caught something worth sitting with: most existing agent evaluations are simple linear chains, 55 to 100 percent of test instances involving just two to five steps. Models that look capable in tests may be navigating nothing more complex than a hallway. GTA-2 makes a similar point about tool-use benchmarks being misaligned with real-world workflow complexity.

There's a parallel concern inside the models themselves. The diversity collapse paper shows that post-training narrows output variation, which quietly undermines inference-time scaling methods that depend on getting different answers from the same model.

So the enterprise rollouts assume robustness. The benchmarks keep finding brittleness. That gap doesn't resolve itself just because the contracts are signed.

THE WEEK'S ARC 2026-04-13 — 2026-04-20

The week's most revealing detail isn't a model launch. It's the list of things being measured: whether AI sabotages its own research, whether it understands animal biology, whether it can be trusted to reason faithfully, whether it generates fake music convincingly enough to need forensic detection.

The labs are shipping faster than anyone can audit. So the field is quietly building the audit infrastructure in parallel.

Anthropic signed safety MOUs, published RSP Version 3.0, and expanded its Long-Term Benefit Trust board. Google introduced a "cognitive framework for measuring progress toward AGI." Researchers published ASMR-Bench specifically to catch AI sabotaging ML research. AtManRL and "Beyond Surface Statistics" both chase the same ghost: an AI that says it's reasoning but isn't.

That's not a research trend. That's a trust deficit being papered over with benchmarks.

The product announcements kept coming: Claude Opus 4.7, Gemini 3.1 in four flavors, Qwen3.5-Omni at hundreds of billions of parameters, GPT-Rosalind for life sciences. Bigger, faster, more vertical. The capability curve isn't slowing.

Which means the gap between what these systems can do and what anyone can verify about them just got wider again.

see sources
  • ASMR-Bench was created specifically to detect AI sabotaging ML research. → ASMR-Bench: Auditing for Sabotage in ML Research
  • AtManRL targets faithful reasoning by using differentiable attention saliency. → AtManRL: Towards Faithful Reasoning via Differentiable Attention Saliency
  • Anthropic published Responsible Scaling Policy Version 3.0 this week. → Responsible Scaling Policy Version 3.0
  • Qwen3.5-Omni scaled to hundreds of billions of parameters. → Qwen3.5-Omni Technical Report
  • Google introduced a cognitive framework explicitly for measuring progress toward AGI. → Measuring progress toward AGI: A cognitive framework
Pipeline last ran: 16h ago
01 — WRITING
01 / SUBSTACK — 8-TRACK TO AI

Writing

MAR 18, 2026
The Immortal Employee
What if the smartest person on your team never actually left? On AI, institutional knowledge, and what it means to retain expertise.
MAR 2026
MAR 11, 2026
The Case for Optimism
Why hope is combat, not surrender. A GenX take on staying constructive in the middle of a technological transition most people still don't believe is real.
MAR 2026
FEB 24, 2026
The Napster Moment for AI
The revolution went open source. The gatekeepers don't stand a chance — and we've seen this movie before.
FEB 2026
FEB 22, 2026
The Stream Deck Nobody Asked For
How a bedtime phone session turned into a GitHub repo by morning. On building with AI, late-night itch-scratching, and shipping before breakfast.
FEB 2026
All Articles ↗
02 — BUILDING
02 / OPEN SOURCE

Building

★ 3
TypeScript · AI Agents
OperatorOne
AI-agent-centric operations platform for solopreneurs and small teams. Brings agentic workflows to everyday business operations.
Public View repo →
★ 1
Python · Developer Tools
streamdeck-claude-code
Stream Deck icon pack and config generator for controlling Claude Code via Ghostty. Hardware shortcut layer for AI-assisted development.
Public View repo →
03 — WORKING
03 / EXPERIENCE

Working

2026 — PRESENT
Strata Identity
Director of Engineering Operations
AI agentic workflows, context engineering for business applications, AI prototyping, and organizational design. Leading the intersection of people systems and AI-native operations.
AI Workflows Context Engineering AI Prototyping Org Design SaaS Ops
current role
2024 — 2026
Strata Identity
Director of Operations & People
Strategic advisor to executive team. Led ADR framework for employee engagement, retention, and leadership development. Implemented scalable systems for conflict resolution and culture-building.
People Strategy Leadership Dev Retention
2021 — 2024
Strata Identity
Director of Operations / Director of Customer Success
Orchestrated global operational systems and SaaS tool implementations (HubSpot, Salesforce). Established and scaled Customer Success for enterprise clients, improving engagement, retention, and NPS.
Operations Customer Success HubSpot Salesforce
2019 — 2020
Accenture
Tech Advisory Consulting Manager
Partnered with senior leadership of a major telecom on organizational design, change management, and Agile implementations. Led Salesforce migration impact assessment.
Consulting Org Design Change Mgmt Agile
04 — CONTACT
04 / CONTACT

Get in touch.

Substack ↗ LinkedIn ↗ GitHub ↗ X ↗