We recently came across an issue with one of our clients that brought to our attention a gap in our monitoring services.
What prompted this improvement
While we monitor many aspects of our clients’ sites, until recently we were not actively monitoring SSL certificate validity or certificate expiration. Due to this gap in monitoring, we had an unforeseen and unfortunate incident, where a client website was unavailable in the early business hours and our monitoring did not notify our team in an appropriate manner.
It goes without saying that our clients’ uptime, website performance, and the general expectation that we are there for them is paramount to Linchpin’s success. This has been a core tenet of our technical expertise and for over a decade we have always had a process of monitoring the uptime and general server/application performance across all of our clients.
At the minimum, we monitor a website’s “Home page” for a
200 status. In not so techie terms. This means the site is “Up”, is accessible, and everything is right with the world. Any other status (Ex. 404 missing pages, 502 server timeout, 503 code error) will result in an alert to the appropriate group through email, Slack, SMS message, etc.
What tools do we use?
We accomplish this through our own internal tools as well as
- New Relic for application performance
- Status Cake
- And/or JetPack/WordPress.com or site monitoring as well.
- Rigor, Ghost, Browserstack for more advanced implementations (More on this in a later post)
The incident that spurred this initiative was the result of a free Let’s Encrypt certificate failing to renew properly, which resulted in a certificate expiring. This resulted in an insecure website warning when someone visited the website. Even though the system itself was seeing a
200 / “Up” status, this essentially meant the website was down and no notifications were firing.
Could this happen to my site?
The series of events that lead to the free Let’s Encrypt certificate not renewing correctly was pretty specific and our team was able to work with our hosting partners at WP Engine to determine exactly how the incident occurred. We have also made adjustments to our process to make sure the issue should never happen again, with that being said…
Improved monitoring for all active Linchpin clients.
We are proud to announce that we are expanding our monitoring services for all active Linchpin clients to include the following supporting services free of charge.
- Full SSL Monitoring. This includes monitoring of premium/paid SSL certificate expiration. The ability to notify clients of certificate expiration 30 days out, 7 days out, and the day before a certificate expires. For free certificates from Let’s Encrypt that renew every 90 days, we will notify clients the day before expiration. Upon renewal of the certificate(s), the client will be notified as well with an update of when the certificate is expected to renew again. SSL Monitoring happens roughly every 30 minutes.
- Full domain monitoring, including renewal date, whois monitoring, etc. Domain expiration/renewal checks will happen daily.
- Uptime monitoring is now happening every minute at a minimum and can be expanding to realtime on a per-client basis.
Even more features on the horizon
This is just an incremental step in improving our monitoring solution to better serve our clients and we have some more exciting features on the horizon for the first quarter of 2020.
If you have any questions, concerns or comments regarding these improvements, feel free to reach out to the team. If you see this post and are not an active Linchpin client and you feel like this is the type of support your partner should provide, connect with us.