The biggest barrier to big data analysis with Hadoop is often getting data to a place where it’s accessible for analysis in the first place. Last week, our CTO Ian Hamilton addressed the topic in some depth for Data Informed, an online publication covering big data and analytics for business and IT professionals.
“As organizations with new big data analytics initiatives look to utilize Hadoop, one critical step is often forgotten,” said Hamilton. “Unless your data is being captured in the same place where it will be analyzed, companies need to move large volumes of unstructured data to readily accessible shared storage, like HDFS (Hadoop distributed file system) or Amazon S3, before analysis.”
Here are the highlights:
Unstructured data such as video or sensor data is typically generated at distributed points that can be very far away from each other, and even further away from accessible shared storage. Since a single minute of HD video recorded at 50Mbps cab reach half a GB, gathering extended footage from multiple cameras very quickly breaches the file transfer capacities of common methods such as FTP, even under very high bandwidth conditions.
Many professionals outside of digital film production and post-production are unaware that a solution to transferring those huge unstructured data sets has been around for over a decade. The core technology that’s been used within Hollywood enterprises for person-to-person transfers of very large uncompressed footage has recently been incorporated into SaaS solutions that connect with cloud storage.
“It wasn’t until the cloud revolution and the development of SaaS/PaaS (software-as-a-service/platform-as-a-service) solutions that this technology became relevant and accessible to everyone else,” said Hamilton.
As more and more companies look to utilize cloud-based big data analytics like Hadoop or any other cloud application that requires the movement of large data volumes, we may see the spread of technologies that helped make feature films and episodic television what they are today. Which is kind of interesting, no?
If you haven’t already, check out the full article by Ian Hamilton on Data Informed.