Public

Taking Customer Success Personally

Lithium Alumni (Retired)

Friday May 10th at 7:25 am Pacific Daylight Time, Lithium experienced a significant network outage. However unwanted, unpleasant, and unplanned such events can be, in the world of SaaS technology, outages can be a very real possibility. But that factor doesn’t mitigate the downstream effect we know this has for our customers and their customers who depend on our technology. In the spirit of our company value of ‘being real’, I want to share with all of you what Lithium learned from the experience and where I see opportunities to continue to strengthen our performance.

 

The outage in our North America data center lasted just under 3.5 hours and created an intermittent experience for a number of our customers, both in the US and those that this data center serves abroad. We believe that the outage resulted from what is called a Denial of Service attack on our servers, which essentially means that someone from the outside flooded our networks with spurious traffic impacting Communities, Lithium Social Web instances and our internal email system.  As a result of this attack, there is no evidence that our customers’ information stored in our systems has been compromised. While we continue to investigate exactly what triggered the attack, there are a number of immediate steps that we are taking including:

 

1. Fortifying our edge network with increased capacity and additional DOS prevention features to withstand traffic floods and thwart DOS attacks.

2. Improving our monitoring system so it can identify the attacks faster.

3. Practicing our ability to quickly route our traffic through our existing DOS prevention partners. Our partners have the ability to strip out the spurious traffic and route clean traffic to us giving us the ability to withstand a DOS attack.

 

I know many of you often hear me talk about the values of Lithium, and Friday’s situation reflected two of these values very accurately: Take Customer Success Personally and Learn Fast, Act Faster. For me, Friday was a solid testimony of how well Lithium is living up to its core values.

 

While the engineering and tech ops team worked urgently to solve the network issues, our sales, service and marketing teams rallied to provide transparent communication to affected customers in short order—leveraging both social channels and outbound call downs when emailed failed us. Our performance here was strong and in my eyes reflective of just the type of culture I want to see thrive at Lithium.

 

I want to reiterate that we are committed to being a world-class SaaS company. We are proud of the service-levels we provide customers and I want you to know just how seriously Lithium takes any disruption in our service to you and your customers. You rely on us and I can only assure you that what transpired on Friday demonstrates what you can expect from Lithium: quick recovery, full transparency and incorporating any learning quickly back into the business.

 

 

5 Comments
Not applicable

Rob,

 

Great post but unfortunately our experience was not that great. Our support number that we called went unanswered, the next number also went without being answered.

 

Our technical account manager's response was also lacking.

 

I do thank the people that assisted once we managed to speak to some of the good people in the US office but sadly it was not as smooth as what the post suggests.

 

Regards,

 

Valued Contributor

Thank you very much for this summary.

 

I would like to highlight the fact that I received emails and phone follow-ups during the crisis which is greatly appreciated.

 

I did felt the Lithium organization was completely behind this siutation and working collectivetly to support us.

 

Even with this outage, Lithium was able to push changes to my community in my live environment!

 

Kudos

Lithium Alumni (Retired)

Dear CDT (anon),

 

I’m so sorry to hear about this and we'd very much like to hear more about it. Please email us at supportleads@lithium.com so we can talk about your experience.

 

Rob

Occasional Advisor

Good

We were able to leave a message on the hotline and received a follow up (prelim info, service restored, etc.).

We were able to log 2 litosphere cases when it was available (your own site was up and down so reaching Li was challenging).

Our account mgr responded to us even though he was on vacation.

 

Bad

We shared this with our Li POCs already.

1.  We also opened two Lithosphere cases for ISO compliance and formal tracking even though your own site was unavailable for a definite period of time during the outage.  We discovered the cases we logged were deleted by the support agent(s).  Reason:  Your company's message said we should email outage@lithium.com which appeared to be affected.  We're not supposed to log outage concerns as Lithosphere cases (or so we were told indirectly by finding out about the deletion).

 

2.  While we received a prelim update, we could have used the above information to close the loop, update our internal case tracking against this issue, and ensure we comply with SOPs.  Your detailed explanation did not make its way back to the folks who need to log this info formally for tracking and compliance internally.  It would have been nice to receive the level of info you provided in your post in written form that could have been shared to all who "need to know".

 

3.  In the future if an event happens like this, not clear what the process should be.  Should we call the hotline, email outage@lithium.com, open a case in Lithosphere, all of the above.  Also, how do we ensure we get final, detailed info that you provided so we can update our own case notes and log for ISO compliance?

 

Rob... From this comment, I think you understand the "process" can be improved.  We're providing direct feedback so we can help Lithium learn and improve from this example.  Thanks.

Khoros Alumni (Retired)

Hi Brion, 

 

Thanks for your feedback.  I'm delighted to hear that your account manager is that dedicated - I'll pass along the kudos.  

 

I took some time to look into your feedback over the last couple of days and here's what I found:

 

1.  This was an anomoly with your case portal setup that I'd be happy to explain to you further in private due to the sensitive nature of your setup.  I do apologize for the inconvenience though and I'm working to address one-off scenarios such as this in the future.  

 

2.  We manage a customer notification list for a variety of issues, but most importantly - release notifications and outages.  If you would like to PM me, I'd be happy to ensure you are on both of those distributions lists and also confirm who we have on the list currently to see if you require any edits for your company.  

 

3.  We are working on a company-wide comprehensive plan for issues like this in the future that allows custmoers to continue to stay in contact with us even our own servers are affected.  Expect more details on that soon!  

 

I'm happy to hear any other feedback you might have, so please don't hesitate to reach out should you need anything.  

 

Thanks, 

Trey Waddell 

Manager, Customer Support

Version history
Last update:
‎06-17-2019 02:44 AM
Updated by: