When a live bug is reported to you - what are your best practices on handling it?
Personally, I understand the issue impacting the user first, then see if I can replicate the issue. Then I try to get a sensing of how big the issue is, and I rope in my QA / Dev in to investigate the issue.
My problem here is that sometimes the bug analysis takes a long time and I end up joining a working call where the devs discuss and conduct the analysis, but I’m not sure where I can help besides waiting for them to tell me the ETA of the bug fix. What do you do during this period?
We have SLA priorities. They’re a good way to measure. Bugs that are fundamentally breaking the application are P0 - everyone drop everything. P1 is fundamental issue for a particular customer, which usually means the on-call engineer drops everything. P2 usually means it’s a major issue with a temp workaround, which has the on-call work it as soon as free. P3 and below go to the backlog, where they’ll be prioritized properly.
Standardize your priorities. Give them definitions. Define responses to each level. You don’t have to copy the above, but something close works.
We use this:
P0: Core functionality is broken for multiple customers
P1: Core functionality is broken for a single customer, or secondary functionality is broken for multiple customers
P2: Secondary functionality is broken for a single customer, or tertiary functionality is broken for multiple customers
P3: Tertiary functionality is broken for a single customer
P4: Cosmetic defects
Presence of a workaround drops priority by one. A sufficiently large/loud customer can raise priority by one.
We have definitions for core, secondary, and tertiary functionality. If something comes up and we’re not sure which level of function is affected, we assume the worst the likely options until we prove otherwise.
We likewise have procedures for engaging on-call, what information to provide to customer-facing teams, when to send an email blast to customers (thankfully rare), if/when we do an emergency release or a hotfix/patch, l land so on.
All tie back to priority and the definitions above.
@DhirajMehta, Love this! Helps maintain focus and momentum. The shippers can keep shipping and firefighters get pulled in if the damage is critical and the fix is urgent.
It should go in you backlog for prioritization if it is a non-urgent bug. If it is urgent then definitely you and the team need to fix it on a priority basis but if it is not urgent then you need to add to your backlog (if you are running scrum) for prioritization and fixing it in the appropriate sprint.
Don’t hang around if you can’t help. This is a best practice pretty much across the entire PM function - don’t do someone else’s job for them. You will never be as good at it as they are (and if you are you have different problems) and you are stealing time from yourself to do other things. Your primary functions are to facilitate, remove blockers, and provide direction in the form of vision and decision-making.
I did the same as you early on in my PM journey where I would investigate, replicate, then try and help QA/Dev get to the root of the issue. I will say that I don’t regret it at all - it was very useful to me at that point because it let me figure out how a lot of things fit together, how to track down bugs and how to roll out bugfixes.
As I got more experienced though I experienced all of the tradeoffs that come with the increase in coverage, and bug hunting was just too big a timesink when it was a) something that other people were better at and b) something that was stealing time from other things I could be doing.
Now I will help with high priority bugs until they are handed to the dev team. Then I tell them to update with me what they’ve found with regards to difficulty and timeframe to fix and I go do something else. That gives me what I need for prioritization, and in truth making the decision is the most important part of what I can provide.
Bugs = written code not acting as designed. Everything else = changes product should process.
- Make sure team gets Steps to reproduce
- Prioritize - major (uptime/security/stops conversion) or minor (inconvenience/opinion)
- Resolve hot issues asap. Everything else goes in the backlog
- Find out why the bug made it past process and plug the hole
What has worked for my teams in the past is a standardised scoring system:
- 1 to 5 on impact: impact is the affected software… 1 = typo or visual bug that has no impact on product usage; 5 = major feature is completely non functional.
- 1 to 5 on reach: reach is the number of users which are have used that in the last 90 days. 1 = handful of low value clients, 5 = more than 50% of high value clients
In my last company, we were able to get the second level support team to do the initial research to propose impact/reach scores for every bug ticket, meaning that PMs had some idea of how quickly they need to look at a given thing.
Multiply these two numbers to get a 1 to 25 bug score. Squads have an SLA to fix 25 immediately and address 20 (not necessarily ship fixes) within a day. Everything else goes into the backlog and is triaged in due course, with the squad deciding if/when these are addressed.
We measure the total bug scores that are in the backlog of teams to understand how much effort we are putting in to these and if we need to step in to understand why a squad is ignoring bugs/encourage them to pick up more. That last point is much more of a strategic look at if squad goals and incentives are aligned with the business than just pushing bug tickets into a priority lane.
Quick prioritization: Does it severely impact major flows? Is everyone affected or just edge case users? How much is this going to affect the user experience and the customer service department? Does it have monetary consequences (e.g., do we need to pay something back or lose out on revenue)?
If it’s not critical I’ll prioritize it amongst other to-dos (and that might mean it gets done next week or it might mean it gets done never, if it only affects every 100000th person when they log in from Internet Explorer 3.2).
I give bug reports in the following structure:
Steps to reproduce (I’ll always try to reproduce first, it’s not always possible but I’ll try), expected result, actual result, any ids or browser info.
If I cannot reproduce, I’ll schedule a time boxed investigation (not time boxed if issue is critical, eg our whole page is down, payments don’t work, cannot complete core functions…).
Unless it’s causing tons of daily calls to customer support: log it, throw it in the backlog, and forget about it until the backlog becomes unwieldy. Then, close it or leave the company so the next sucker can inherit the problem.
At least that has been my experience, typically as the sucker.
@LawrenceMartin, I just completed a month-long exercise of rejecting a ton of bugs as being stale, the customer having been churned or literally everyone involved in the bug and its discussion no longer being there.
I figure it’s a useful thing to add to my resume (following that thread from earlier about using interesting numbers on your resume to sell yourself which no one but you have any way to validate). “Through a bug triage process, I reduced 58% of the known bugs list in the span of 11 months.” At least 38% of it was rejected due to age
Delegate to my QA to research what’s going on, a ticket will be made what I prioritize later on.
I just observe the flow of the debugging and analysis and at the end make a note of what did we do wrong in the flow that it took us the time it took us. Find a unique thing that can be checked for us to get their faster, make a note of what kind of alert/dashboard/log would help us answer this better next time.
And more importantly next time I join that call, I should be able to meaningfully make suggestions on what to check.
Unless it’s a severe bug, there are reporting and escalation flows that I put in place from my days in Support. They are there to help the reporting team determine impact and send on up to engineering.
Engineers should always have time carved out for maintenance or bug handling. I try not to eat up 100% of their time with major projects.
Prioritization helps to solve this problem. Bug goes to backlog or to developers if it’s a critical issue for users. Our team has available slot of time for bugs during every sprint.
Double down and say it’s a feature. Don’t break eye contact. Ask to go back a slide