excel

When Will Tioga Road Open?

For a few days each year Tioga Road, the two lane highway winding through Yosemite National Park, is open to only cyclists. But when will these days be?

Github Kaggle

Introduction

For a few days each spring Tioga Road, the two lane highway winding through the highest elevations of Yosemite National Park, is open to cyclists with no car traffic once the snow has been plowed. It's a dream bike ride for an avid cyclist like me, but it's almost impossible to plan for. The National Park Service refuses to speculate on when they'll have the snow out of the way and the area free of high avalanche risk. This is frustrating and in need of remedy for cyclists who are relegated to figure out travel and lodging on a moment’s notice.

Thankfully, the National Park Service provides historical road opening dates and the USDA has historical snow data. I'll be diving into both of these datasets for this project.

Goal

Predict when Tioga Road will be ready for cyclists in the spring of 2023.

Data Collection

My first step in this process was to collect the data I'll need. The National Park Service provides historical opening dates for Tioga Road dating back to 2011. Because this simply is not much data, I’ve decided to also look at the historical opening dates for three other mountain passes in the Sierra Nevadas – Monitor Pass, Sonora Pass, and Ebbetts Pass.

The next step is to find historical snow data for these four locations. The Natural Resources Conservation Service, a division of the United States Department of Agriculture (USDA) maintains historical snow data for the entire country.

This, unfortunately, is where we hit our first problem. The snow data collection stations located within Yosemite National Park (White Wolf, Tenaya Lake, Tuolumne Meadows, Dana Meadows, and the Tioga Pass Entry Station) have inconsistent and out-of-date data. I’ll begin working with data from Tuolumne Meadows first as this seems to have the most data of all the locations within Yosemite. However, as an alternative should this not prove to be enough data/be too unreliable, the two stations located next closest to Tioga Road and with similar elevations are Virgin Lakes Ridge and Mammoth Pass. I suspect I will ultimately need to rely on these locations as proxies for the snow data on Tioga Road.

I also collected historical snow data for Monitor Pass, Sonora Pass, and Ebbetts pass. Considering the issues surrounding Yosemite’s snow data, including these in my analysis will hopefully add extra confidence.

With our locations and access to snow data set I then had to decide which metric I wanted to track in relation to snow fall. I decided to use snow water equivalent (SWE) as this measures the amount of water within a snowpack at a specific location. In other words, it can be thought of as the depth of water that would result if the entire snowpack were to melt. The Environmental Protection Agency uses this metric to track snow quantity because, "snow can vary in density (depending on the degree of compaction, for example)" so "converting to the equivalent amount of liquid water provides a more consistent metric" than things like snow depth. SWE will be my main point of study, but I will also collect data on snow depth simply out of curiosity.

I downloaded all of the data described previously and created a new Excel spreadsheet.

This is what the original data I collected looked like for Monitor Pass. Data collection of SWE began in 1991 while snow depth collection started in 2002.

Cleaning

The datasets for Monitor Pass, Sonora Pass, Ebbetts Pass, Virginia Lakes, and Mammoth Pass were all consistently good (only Mammoth Pass was missing data for 2013), but the Tuolumne Meadows dataset was a different story. There were a lot of nulls in the Tuolumne Meadows dataset for stretches of years and there simply wasn't data collected within the last two years. The dataset also contained data for all twelve months of the year while the others only had data for Jan-Jun.

The Tuolumne dataset had irrelevant months.

I attempted to clean the Tuolumne Meadows dataset in an effort to salvage it and keep it useful despite all the missing data. Before making any changes though, I made a copy of each dataset in case a mistake was made and then I added a column with the month for each datapoint written out.

Here you can see the extra Month column made by using the formula =TEXT(cell, 'mmm') as well as the format of the date column being changed.

I ran a filter to display July-Dec and then deleted them from the Tuolumne Meadows dataset. As there's no snow on the ground in the summer, it's entirely useless for this project.

Unfortunately there's still missing data and nothing we can do about it.

Unfortunately, there's still a lot of missing data and there's not much we can do about it. From this point on, I'll largely abandon this dataset in favor of Virgin Lakes Ridge and Mammoth Pass as proxies for snow data in Yosemite National Park.

Snow Depth vs. Snow Water Equivalent

Before moving forward, my curiosity got the best of me and I wanted to know the relationship between snow depth and snow water equivalent. I was able to get snow depth and snow water equivalent data for Monitor Pass, Sonora Pass, Ebbetts Pass, and Virginia Lakes so I combined the data from all four sheets and made a quick graph.

The relationship between snow depth and snow water equivalent at Monitor Pass, Sonora Pass, Ebbets Pass, and Virginia Lakes.

It's a fun graph to look at, but it's not surprising. If there's deeper snow, you'll also have more water trapped within. With my curiosity satisfied, we can continue with the main event.

Snow Water Equivalent

I think it will be good background knowledge to know the average snow water equivalent for each location as I would assume more snow = later opening date.

Using a simple AVERAGE formula I got the following results:

Location	Avg SWE from Jan - June	Years
Monitor Pass	8.66	1991-2023
Sonora Pass	16.20	1979-2023
Ebbetts Pass	23.68	1979-2023
Virginia Lakes	11.74	1979-2023
Mammoth Pass	25.15	1990-2023

Scroll to view the full table.

It always amazes me how different the weather can be even when these places are within about 75 miles of each other. However, it's unsettling that Virginia Lakes and Mammoth Pass have such different averages since they're standing in as proxies to the poor quality data along Tioga Road. We carry on though!

Diving Deep Into Snow Water Equivalent

2023 has been a historic winter for the Sierra Nevadas. We can clearly see that in these datasets, but it's unclear how historic 2023 has been based on the data we currently have. For that reason, I want to create an entirely new metric, which I'll call SWE+. This statistic will contextualize each month's recorded SWE for every location on an easy-to-understand scale.

This will be easier to understand as we move forward so let's start with Monitor Pass. I will need to find the average SWE for each month (so Jan-Jun). I'll filter the data for each and take the averages that way. I'll also count how many data points I have for each month as a way to ensure the data I have is consistent throughout all years of collection.

![Monthly averages of SWE for Monitor Pass](https://cdn.sanity.io/images/38t6rk2v/production/9fcebc14ccfa136cdffcbbc5e08e8ff639e7db1a-1552x989.png "The monthly averages were calculated using the following formula: =SUBTOTAL(1, cell range). The count was calculated with a similar formula: =SUBTOTAL(3, cell range).)

The counts are slightly off, but that's only because data hasn't been captured for April, May, and June this year (it's only March!). That's great confirmation that the data was properly cleaned and dealt with. And now we have each month's average SWE for Monitor Pass.

Now, we can divide the actual SWE by the average SWE and multiple by 100. This will be our new statistic I mentioned before, SWE+. An SWE+ of 100% means that month's SWE was exactly average compared to the area’s normal SWE for that month. An SWE+ of 110%? 10% more than average. An SWE+ of 90%? 10% less than average. Rather than having to guess if a SWE of 2.2 in January at Monitor Pass is a lot, we can look immediately at its SWE+ of 37% and know Monitor Pass received 63% less than average for that time of year historically!

I'll add this new SWE+ column into each location's sheet. I'll again filter the data for each month and enter the formula =D4/$H$4. I'll drag this formula down for each data point collected and then format the cells to be percentages.

SWE+ was added so we can immediately know if SWE was above or below average for that time of year.

We can make this even easier to quickly determine which months received unusual amounts of snow with conditional formatting. I'll highlight the top 10% with an icy blue (the months with abnormally high snow levels) and a brown for the bottom 10% (the months with abnormally low snow levels).

With conditional formatting, it's quick and easy to see that, yes, 2023 has been incredibly snowy with sometimes over 300% more snow than usual for that time of year.

If you filter the data just for the months Jan-Mar, thanks to our conditional formatting we can see that 2023 is the only year with all three months in the top 10%! Wow, what a year for snow in 2023!

We can put this filtered data into a graph, too.

SWE+ for the months of January-March at Monitor Pass from 1991-2023. We can clearly see how much of an outlier this year has been.

SWE+ for the months of January-March at Sonora Pass from 1991-2023.

SWE+ for the months of January-March at Ebbetts Pass from 1991-2023.

SWE+ for the months of January-March at Virginia Lakes from 1979-2023.

SWE+ for the months of January-March at Mammoth Lakes from 1990-2023.

Onto Opening Dates

Now that we have all this great information under our belts we can move onto how this relates to the actual opening dates of these passes. As a reminder, we only have opening date data for Monitor Pass, Sonora Pass, and Ebbetts Pass. We will then be using the Tioga Road opening dates with the snow data from Virginia Lakes and Mammoth Pass.

I'd like to find the average date each location was opened. I could do this with a simple =AVERAGE function, but if I did this for Sonora Pass for example, the average would be 11/8/16. It's entirely useless – these roads don’t open in the winter!

The year in these dates is what's throwing everything off. Instead, we'll need to standardize it with an arbitrary year. I've chosen the year 2000 for no particular reason. I'll create a new column, Date 2000, and we'll populate it with the following formula: =DATE(2000, MONTH(A5), DAY(A5)). This will keep the month and day the same, but change the year to 2000. Now when we calculate the average opening date, we get 5/10. Much better! I've also populated these tables with the corresponding SWE and SWE+.

The dates have been changed so we can calculate an accurate average.

As we can see, Tioga Road has the latest opening dates on average, while Monitor Pass has the earliest (although this is skewed heavily by 2014 when it opened in January).

Let's compile all the opening dates and their corresponding SWE+ on opening and make a graph of the resulting table. I would assume we'll have an exponential-type trend line.

Each location's opening date plotted with their SWE+.

Okay, maybe exponential? But regardless, there certainly is a positive relationship between a later opening date and a larger SWE+,

Conclusion

The National Park Service refuses to make a prediction when Tioga Road will be open for cyclists. I now understand why. While I think I made some good headway in better understanding and contextualizing snowfall in the Sierra Nevadas, the lack of opening date data really holds any meaningful conclusions back. We have, however, seen that a higher SWE+ typically indicates a later opening date. With significant SWE+ across all locations in 2023 I feel confident saying Tioga Road will be useable much later than the typical opening date of May 27th. Beyond that, with the data I have, would simply be a guess.

Learning Take-Aways

Having good data is key. I struggled along with the Tuolumne Meadows dataset despite knowing that it probably wouldn't be adequate to answer my initial question despite it being geographically perfect. Turning towards the alternate Virginia Lakes and Mammoth Pass datasets earlier and understanding that any potential conclusions would have to be tempered would have been beneficial.

Don't mess with the original data. When I initially began this project I immediately tampered with the original datasets. Without making new columns or new sheets I altered data types, deleted some rows in support of data cleaning, and began making calculations. Of course, I ran into problems with the calculations and needed to backtrack but because I had altered the original data, I had to restart from scratch. This would have been completely remedied by simply making new columns/sheets and keeping the original data as is.

Sample size is important. While it was literally impossible to have a larger opening date sample size than what I collected, it's importance is not lost. Any conclusions I was able to come to with these datasets was hampered by a lack of actual data points. It will be fun coming back to this project in say 10, 20, 30 years from now when there is more opening date data and make more meaningful predictions.