Shifting to nationwide distance learning – a challenging start with a brighter future
During the week of March 30, we experienced three days with intermittent outages between 10am ET and 12:30pm ET that resulted in a lower quality of experience for Schoology than you’re used to, and of a lower quality than you deserve. As a founder, currently SVP Schoology at PowerSchool, and formerly the CEO of Schoology, I am disappointed and take accountability. At the same time, I am more confident than ever we can meet the needs of all our customers and users. The note below is meant to provide transparency into the issues we faced, what we did to resolve them, and how we will do better moving forward.
Background of Schoology
When we created Schoology just over ten years ago, our mission was to advance what’s possible in education, and to help enable personalized learning opportunities for every student in the world.
We’ve grown significantly over the years, helping more than 20 million users across thousands of schools and districts including independent teachers to some of the largest deployments at state and big districts in North America. Aside from the ease of use, the built-in collaboration, and other features and functionality that people love, reliability has been a major factor to our success. In fact, in 2019, the team was so proud to announce that we had experienced 99.998% uptime.
Schoology is one of the most scalable platforms in K-12 education. The product is hosted 100% in the cloud, using Amazon Web Services, and leverages state of the art automation and technology, such as AWS Aurora, S3, Redshift, CloudFront, and others. Today, with the scale and investment that comes with being a part of PowerSchool, we’ve been able to further advance what's possible and join the host of PowerSchool solutions that reach 45 million students, and we have the experience, and resources to scale Schoology to any level.
Preparing for the Impact of COVID-19 on Schoology
Every person who works at Schoology does so because they believe in and are passionate about the mission. And while we all feel that our platform can help advance what’s possible, we never imagined that our service would become so critical in addressing the challenges of an overnight pivot to distance learning by 100% of our customers.
As soon as educators and government leaders signaled that schools might close due to the COVID-19 pandemic, everyone in PowerSchool and Schoology took immediate steps to prepare our systems. This included modeling for a large increase in users accessing the system, investing significantly in our hosting infrastructure automation, adding more capacity to serve more users and adding additional monitoring capabilities so that we could scale our platform.
While we continued to see increases in usage throughout March, exponential growth occurred during the week of March 30 as educators returned from Spring Break, resulting in a more than 400% increase in usage, particularly during our peak usage times of 10am ET – 12:30pm ET. This increase was on top of the growth we had already been seeing throughout the month.
We knew there would be a large increase in usage, but we didn’t predict that it would be more than 400% in a single day. The shift in usage and the way it was being used stressed our system in new ways and caused bottlenecks that we did not plan. As much as we imagined that more people would be using Schoology, we simply couldn’t predict how school closures would fundamentally change user behavior, and that this change would happen in a mass and virtually overnight.
Google Integration and the Rate Limit Issue
At the same time we were experiencing our highest peak usage, we noticed that our Google Drive Assignment functionality started to degrade. Upon further investigation, it turned out that our integration with Google was hitting the Google Drive API rate limits. Executives at PowerSchool were able to connect with executives at Google to fast-track a significant increase to our rate limits. This not only fixed our immediate issues, but it put measures in place to ensure long-term success for the integration. Most importantly, this seemingly small issue was a major culprit of the outages we experienced.
To best support everyone during peak usage periods, we occasionally had to turn off some processes, and limit certain features, including staggering access to the system for short periods during peak times. These short-term measures allowed us to maintain service levels and preserve the experience for all our users, while we continue to implement the long-term solutions that will prevent us from having to disable these items in the future.
We worked with our hosting provider, Amazon Web Services (AWS), to holistically improve performance and scalability. We constantly monitored and tuned different parts of the platform to best support the massive increase in users and shift in usage patterns. We identified and addressed areas of the site to improve performance amidst increased usage, and we continue to invest in further improvements.
Great progress and continued focus
Over the past week, Schoology has been performing very well even as the record number of users and the different usage patterns have remained. This is a testament to our teams working 24x7 to address issues, making hard decisions that were best for the whole community of users.
Our teams have done an amazing job of identifying which areas of the system need to be monitored, what specific actions to take during peak hours and how to ensure a great user experience for everyone. It is still possible that we will have to stagger users and throttle features during peak hours if the need arises to provide the best experience for the entire user community. But I also want everyone to know that we have made significant investments and are continuing to make the further enhancements to our platform that will allow us to support and scale for the even higher usage today and into the future.
Also, moving forward, we will not only continue to update the status page when issues do arise, but we will also do a better job of proactively communicating. There is no perfect way to communicate which features are being turned off, and for how long, but we will pay special attention to balancing transparency and information overload.
Our Commitment and Dedication to Your Success
From the moment we started experiencing issues, our teams quickly went into what can only be described as an emergency operations mode. That meant our teams committed every single waking hour to taking the steps required to return Schoology performance back to what our users had come to expect and ensure that we could continue to support your mission moving forward. More than 100 people across PowerSchool and Schoology worked overnight and throughout the entirety of the weekend to make many of the improvements discussed. Not a single person complained. Instead, the team rallied together for the mission, so that every one of our customers and users could go back to the experience that they are accustomed.
The current need for Schoology and the role it can play during this critical time is exactly what drives our entire team as we continue to work around the clock to make sure Schoology lives up to the moment we are all in together.