Three ways the Baseball Hall of Fame explains tiered storage strategy using Amazon S3 and Glacier
For the past three years, my son has played in a baseball tournament in Cooperstown, New York, the birthplace of American’s favorite pastime and home to the Baseball Hall of Fame. An experience rich with history and heroism, the Hall of Fame is always a reminder of the role baseball has played in breaking down social barriers and reviving what we hope is something essential in the American spirit.
I would highly recommend it to any baseball or sports fan in the world. And if you happen to be an IT pro as well, the layout of the museum offers some intriguing insights. Passing through the archway to the grand entrance hall, a vast number of the most popular exhibits featuring favorite teams and players are easily accessible. From there, the maze of the museum can be navigated, including the archives of baseball, it’s origin and highlights of defining games already won or lost.
Over a series of visits, I noticed that more timely and current exhibits are at the forefront of the hall while access to the deeper history requires exploring further in. And artifacts that most people don’t see or are not of common or current interest are stored far off the beaten path with less attention and cost paid to maintenance and operation.
With the annual Amazon re:Invent conference upon us, I’ve noticed how much this physical experience, or any major museum experience for that matter, is similar to a digital tiered storage strategy.
Tiered storage strategies help you govern and provide access to digital content or data sets quickly for Big Data analytics or perhaps nearline data for trending analysis with S3 and deeper archives like Glacier.
A key factor of a successful tiered storage strategy is optimizing the use of each tier, whether that’s on-premises, nearline/cloud storage, or reduced redundancy/archive storage. The faster you can move data from on-premises to the cloud, the better you can take advantage of finite and typically highly valued on-premises storage.
Matt Yanchyshyn from Amazon notes in his recent Big Data Blog series about moving assets into and out of the cloud that, “One of the biggest challenges facing companies that want to leverage the scale and elasticity of AWS… is how to move their data into the cloud. It’s increasingly common to have datasets that are multiple petabytes. Moving data of this magnitude can take considerable time.”
Speed Matters for both storage and analytics
As organizations continue to adopt cloud storage and implement tiered storage strategies — factoring in cost, location, and file availability — how fast you can move your files into and out of cloud data storage is a key component to making this strategy both cost effective and operationally efficient. And that’s particularly true when you have many gigabytes of data and/or are maintaining enterprise storage, considering the underlying operational and CAPEX costs.
While storage and archives are still the most common reason to upload to Amazon S3, access to cloud-based analytic tools are attracting large amounts of data to S3 as well. And many people are looking for solutions to accelerate the otherwise slow transfer to cloud data storage. “As analytics workloads continue to expand in the cloud,” says Yanchyshyn, “these large file-delivery solutions are being applied to data transfer for big data.”
Three ways tiered storage strategy is like the Baseball Hall of Fame
To crystalize our metaphor, here are three ways that tiered storage strategy is like the layout of the Baseball Hall of Fame:
1) The Hall of Fame overall is your cloud storage and Amazon S3 is the grand entrance hall in your storage plan. You keep current and key items you need access to in that main hall, which are your nearline digital assets.
2) From there you can quickly tap into other key services that you may need to complete your workflows. For media and entertainment, that could be services like transcoding. For analytics, that could be Hadoop and MapReduce. Those items are all easily accessible right off of S3, like upfront exhibits at the Hall of Fame give current highlights of your favorite team from this season or last.
3) For many assets though, you don’t need to keep them in the main hall. If you have an asset that you don’t need to access quickly, and want to be smart about how you use your storage budget and resources. That asset can go to the archives, or in this case Amazon Glacier, a reduced redundancy, lower cost storage option.
For example, you may have a historic set of assets that you only need to pull up from the archives at specific times or based on trending events, such as an archived segment from a Reality TV show that’s suddenly gotten attention, or in the baseball world, perhaps Curt Schilling’s bloody sock from the 2004 World Series. After all it is the 10th Anniversary of one of the most spectacular World Series (if you’re a Red Sox fan, that is).
Accelerated upload to Amazon S3 and Glacier
These three points all tie back to how you get your data into or out of the cloud. For each of these scenarios, speed matters, because if you can’t get your data into or out of the cloud data storage quickly, you can’t capitalize on other cloud services. And you aren’t using your on-premise storage as effectively as possible by leveraging S3 data storage for nearline assets and Glacier storage for archives.
Choosing an accelerated file transfer solution to move your files into and out of cloud data storage is critical. And choosing one that is SaaS based means you can get started in minutes, compared to having to acquire/provision, manage and run your own license.
After all, the less you have to think or worry about how you are going to get your files into the cloud, the more you can focus your time and resources on being at the top of your game.
If you are attending Amazon re:Invent (November 11-14), stop by our booth (#448) and nerd out with us about your tiered storage strategy, baseball or Sabermetrics!
In the mean time, check out our SaaS solution, Signiant Flight, to upload or download files to S3 quickly.