alex mac

say hello!

alex mac

work

about

resume

alex mac

Automated alerts for proactive troubleshooting

Role

Product Designer

Timeline

3 weeks

Company

Bloomfield Robotics

Description

Everyone had a solution they wanted to build, but no one could articulate the actual problem.

Problem

Customers were waiting on time sensitive insights, but we were the last ones to know something had gone wrong.

When a scan failed, the first signal was a customer complaint. The issue would travel through a long communication chain before reaching someone who could actually diagnose it. Engineers were pulled off feature work to investigate failures they hadn't seen coming.

Solution

Automated alerts fit into existing workflows and surfaced issues before the customer noticed.

Automated alerts consolidated tools and provided visibility into the troubleshooting process.

Impact

8h → 2-3h

Reduced average resolution time

Roadmap updates

To include initiatives for proactive issue resolution strategy

Reduced

Fragmentation in process that requires cross-functional teams

Untangling the root problem

Research

Interviews revealed a fragmented process.

I started by interviewing engineering, QA, and customer support teams. Each team had a solution they wanted to build.

Engineering

"We need queue management."

Support wanted to prioritize customers' scans. Engineers wanted a tool to reorder the processing queue.

"We need pipeline tracking."

When scans failed, engineers wanted visibility into exactly where in the pipeline the failure occurred.

Customer Support

"We need escalation clarity."

Support fielded complaints but couldn't tell if issues were being investigated or had been overlooked.

When I asked teams to explain how failures actually appeared, a pattern emerged:

Everyone looked for the same problems in their own way.

"

We have to be paying attention to all these different, nuanced things that aren’t always obvious.

– QA

"

It's a lot of putting puzzle pieces together… I'm looking for the route that makes the most noise.”

– Engineering

Key insight

Journey maps showed where failures went unnoticed.

Understanding the fragmentation led me to map the entire troubleshooting journey. I needed to see where things actually broke down.

Team

Failure

Detection

Diagnosis

Resolution

System

System

Silent failure

Processing stops, noisy alerts are sent.

Invisible

System has no mechanism to surface the issue internally.

Fragmentation

Logs exist across Honeycomb, Dagster, and the dashboard. No unified view.

Manual fix

Engineer resolves, but no record created for future incidents.

Customer

Customer

Unaware

Customer continues to expect scan delivery.

Reports issue

Customer notices missing scan and contacts support. Insight window may have already closed.

Waiting

No visibility into status or expected resolution time.

Notified

Customer success relays resolution. No explanation of cause.

Customer Success

Customer Success

Escalate

Receives customer complaint. Routes to engineering with no diagnostic info.

Waiting

No way to check scan status. Waiting on engineering updates.

Resolved

Relays resolution to customer.

Engineering

Engineering

Firefighting

Pulled from feature work. Starts investigation from scratch, checking logs across multiple tools.

8 hours later

Resolves. No pattern recorded. Cycle repeats next incident.

Key findings

Areas of opportunity emerged from looking at the whole process.

1

Diagnosis

Diagnosing an issue is difficult due to the volume of varied, obscure failures.

2

Communication

Communication across teams turns troubleshooting into a multi-step process.

3

Transparency

Both customers and internal teams lack insight into troubleshooting efforts.

Ideation

I asked teams to brainstorm ideas that addressed root problems, not just immediate pain points.

Collaborative ideation helped us come up with solutions that would actually be adopted.

Solutioning

Prioritizing with product revealed alerts as a quick win.

I collaborated with the product team to evaluate solutions against the three problem areas. We looked for a quick win that could address all three.

What we considered, but didn't explore:

Central dashboard to view scan status

Diagnosis

Communication

Transparency

Assign a reporting lead for resolution team

Diagnosis

Communication

Transparency

Dashboard notes for investigation

Diagnosis

Communication

Transparency

Why alerts solved the core problem

Diagnosis

Alerts surface stalled scans automatically, before anyone has to look for them.

Communication

Alerts go directly to engineering Slack, creating an automatic coordination thread.

Transparency

Everyone can see when issues are detected and who's investigating.

Pivot

Alerts were a low lift way to move reactive firefighting into proactive issue detection.

The fragmented, reactive steps from the beginning slowly started to shift to a proactive process.

Design

But, customers still had no way to see scan progress or understand when issues were being resolved.

The alerting system solved the immediate problem of proactive detection, but internal visibility was only one piece.

I collaborated with another product designer to explore customer-facing dashboard concepts.

Key features explored

Issue reports and ETA

Report when servers go offline and an estimated completion time.

Pipeline progress

Step-by-step breakdown of AI processing stages.

Report CTA

Enable faster support contact with a report button.

Time constraints meant that these customer facing updates didn't ship at my time at Bloomfield, but the research validated customer needs and the updates are now a part of the product roadmap.

Reflections

Building for complex systems requires slowing down to understand problems.

1

Research created necessary friction

In a fast-moving startup, slowing down felt counterintuitive. But research gave teams permission to align around shared problems rather than competing solutions. This ultimately sped up delivery of meaningful fixes.

2

Finding the right experts

No single engineer understood the full system. I learned to approach problems with general understanding and ask experts quick questions when I needed clarity on specific decisions.

3

Adoption over elegance

The most beautiful solution is not the most used. The Slack alert wasn't elegant, but it worked. It poked engineers when something needed fixing rather than creating another tool they had to remember to check.