Automated alerts for proactive troubleshooting
Role
Product Designer
Timeline
3 weeks
Company
Bloomfield Robotics
Description
Everyone had a solution they wanted to build, but no one could articulate the actual problem.
Problem
Customers were waiting on time sensitive insights, but we were the last ones to know something had gone wrong.
When a scan failed, the first signal was a customer complaint. The issue would travel through a long communication chain before reaching someone who could actually diagnose it. Engineers were pulled off feature work to investigate failures they hadn't seen coming.

Solution
Automated alerts fit into existing workflows and surfaced issues before the customer noticed.
Automated alerts consolidated tools and provided visibility into the troubleshooting process.

Impact
8h → 2-3h
Reduced average resolution time
Roadmap updates
To include initiatives for proactive issue resolution strategy
Reduced
Fragmentation in process that requires cross-functional teams
Untangling the root problem
Research
Interviews revealed a fragmented process.
I started by interviewing engineering, QA, and customer support teams. Each team had a solution they wanted to build.
Engineering
"We need queue management."
Support wanted to prioritize customers' scans. Engineers wanted a tool to reorder the processing queue.
QA
"We need pipeline tracking."
When scans failed, engineers wanted visibility into exactly where in the pipeline the failure occurred.
Customer Support
"We need escalation clarity."
Support fielded complaints but couldn't tell if issues were being investigated or had been overlooked.
When I asked teams to explain how failures actually appeared, a pattern emerged:
"
We have to be paying attention to all these different, nuanced things that aren’t always obvious.
– QA
"
It's a lot of putting puzzle pieces together… I'm looking for the route that makes the most noise.”
– Engineering
Key insight
Journey maps showed where failures went unnoticed.
Understanding the fragmentation led me to map the entire troubleshooting journey. I needed to see where things actually broke down.
Silent failure
Processing stops, noisy alerts are sent.
Invisible
System has no mechanism to surface the issue internally.
Fragmentation
Logs exist across Honeycomb, Dagster, and the dashboard. No unified view.
Manual fix
Engineer resolves, but no record created for future incidents.
Unaware
Customer continues to expect scan delivery.
Reports issue
Customer notices missing scan and contacts support. Insight window may have already closed.
Waiting
No visibility into status or expected resolution time.
Notified
Customer success relays resolution. No explanation of cause.
Escalate
Receives customer complaint. Routes to engineering with no diagnostic info.
Waiting
No way to check scan status. Waiting on engineering updates.
Resolved
Relays resolution to customer.
Firefighting
Pulled from feature work. Starts investigation from scratch, checking logs across multiple tools.
8 hours later
Resolves. No pattern recorded. Cycle repeats next incident.
Key findings
Areas of opportunity emerged from looking at the whole process.
1
Diagnosing an issue is difficult due to the volume of varied, obscure failures.
2
Communication
Communication across teams turns troubleshooting into a multi-step process.
3
Transparency
Both customers and internal teams lack insight into troubleshooting efforts.
Ideation
I asked teams to brainstorm ideas that addressed root problems, not just immediate pain points.

Collaborative ideation helped us come up with solutions that would actually be adopted.
Solutioning
Prioritizing with product revealed alerts as a quick win.
I collaborated with the product team to evaluate solutions against the three problem areas. We looked for a quick win that could address all three.
What we considered, but didn't explore:
Central dashboard to view scan status
Diagnosis
Communication
Transparency
Assign a reporting lead for resolution team
Diagnosis
Communication
Transparency
Dashboard notes for investigation
Diagnosis
Communication
Transparency
Why alerts solved the core problem
Diagnosis
Alerts surface stalled scans automatically, before anyone has to look for them.
Communication
Alerts go directly to engineering Slack, creating an automatic coordination thread.
Transparency
Everyone can see when issues are detected and who's investigating.
Pivot
Alerts were a low lift way to move reactive firefighting into proactive issue detection.
The fragmented, reactive steps from the beginning slowly started to shift to a proactive process.

Design
But, customers still had no way to see scan progress or understand when issues were being resolved.

The alerting system solved the immediate problem of proactive detection, but internal visibility was only one piece.
I collaborated with another product designer to explore customer-facing dashboard concepts.
Key features explored

Issue reports and ETA
Report when servers go offline and an estimated completion time.

Pipeline progress
Step-by-step breakdown of AI processing stages.

Report CTA
Enable faster support contact with a report button.
Time constraints meant that these customer facing updates didn't ship at my time at Bloomfield, but the research validated customer needs and the updates are now a part of the product roadmap.
Reflections
Building for complex systems requires slowing down to understand problems.
1
Research created necessary friction
In a fast-moving startup, slowing down felt counterintuitive. But research gave teams permission to align around shared problems rather than competing solutions. This ultimately sped up delivery of meaningful fixes.
2
Finding the right experts
No single engineer understood the full system. I learned to approach problems with general understanding and ask experts quick questions when I needed clarity on specific decisions.
3
Adoption over elegance
The most beautiful solution is not the most used. The Slack alert wasn't elegant, but it worked. It poked engineers when something needed fixing rather than creating another tool they had to remember to check.
