The Reddit comment that started this
I came across a comment describing a team's process for making a simple DNS change: commit to GitHub, manually copy the PR number into Jenkins, manually trigger an apply pipeline, use custom CLI tooling for approvals, deal with undocumented linting formats. What should have been a 5-minute task had become a multi-hour ordeal — especially painful for non-developer engineers who just needed to update a record.
This isn't an edge case. I've seen versions of this at multiple organisations. And every time, the people who built the tooling genuinely believed they were making things better.
The real problem: we optimise for builders, not users
When a platform team builds internal tooling, they're usually solving for correctness and control — making sure changes are reviewed, auditable, and safe. These are valid goals. But somewhere in the process, the user experience of the person making the change gets deprioritised until it's an afterthought.
The result is tooling that's technically sound but practically unusable. Engineers route around it. They find the backdoor, the manual override, the "quick way" that bypasses the process entirely. And now you have neither safety nor usability.
What the DNS change example actually reveals
Let's break down what went wrong in that Reddit example:
- Manual PR number copying — the pipeline has no integration with the VCS. It's treating humans as the integration layer between two systems that should talk to each other.
- Manual pipeline trigger — automation that requires a human to trigger it is not automation. It's a reminder system with extra steps.
- Custom CLI for approvals — a separate tool for one step of the workflow breaks the mental model. Either everything happens in the CLI or nothing does.
- Undocumented linting formats — if you need tribal knowledge to use the tool, the tool is broken. Documentation is part of the product.
Each of these issues in isolation is minor. Together they create a system where making a DNS change requires expertise that has nothing to do with DNS.
The mindset shift: internal tools are products
The engineering teams using your platform are your users. They have the same right to a good user experience as your external customers. "Good enough for internal use" is not a standard — it's a symptom of deprioritised platform work.
Concretely, this means:
- Every workflow should have a clear happy path that requires no tribal knowledge
- Error messages should tell you what went wrong and what to do about it
- The number of systems a user has to touch for a single logical operation should be minimised
- Non-developer engineers should be able to complete routine operations without filing a ticket with the platform team
How to diagnose your own tooling
Ask someone who didn't build the tool to complete a routine operation — a DNS change, a new service deployment, an environment variable update — while you watch without helping. Don't intervene. Don't explain. Just observe where they get stuck.
Every point of confusion is a UX bug. Treat it like one.
Practical improvements that actually work
1. Collapse multi-system workflows into one trigger
# Bad: manual steps across three systems
# 1. Open PR in GitHub
# 2. Copy PR number
# 3. Paste into Jenkins job parameter
# 4. Click "Build Now"
# Good: one event triggers the whole workflow
# PR opened → GitHub webhook → Jenkins pipeline auto-triggers
# PR number passed automatically via ${{{{ github.event.pull_request.number }}}}
2. Make approval workflows part of the main interface
If your team uses Slack, approvals should happen in Slack. If they live in GitHub, approvals should be PR reviews. Don't introduce a third system for a single step in an otherwise two-system workflow.
3. Write runbooks for every non-obvious operation
The test: could a new engineer who joined last week complete this operation using only the runbook? If the answer is no, the runbook is incomplete. Every operation that touches production should have one.
4. Version and document your linting rules
# .github/CONTRIBUTING.md — make the rules visible
## DNS Record Format
Records must follow this format:
name: subdomain only (no trailing dot)
type: A | CNAME | MX | TXT
ttl: integer in seconds (minimum 300)
# Automate enforcement with pre-commit hooks
# so engineers get feedback before pushing, not after
The question worth asking your team
What's the most frustrating internal tool you use regularly? Not the most complex — the most frustrating. That's your highest-priority platform work. Fix that before adding new features to anything.
Automation should make things easier. If it's making things harder, it's not automation — it's overhead with a pipeline attached.