The Biggest Barrier to Big Data Analytics with Hadoop

The biggest barrier to big data analysis with Hadoop is often getting data to a place where it’s accessible for analysis in the first place. Last week, our CTO Ian Hamilton addressed the topic in some depth for Data Informed, an online publication covering big data and analytics for business and IT professionals.

“As organizations with new big data analytics initiatives look to utilize Hadoop, one critical step is often forgotten,” said Hamilton. “Unless your data is being captured in the same place where it will be analyzed, companies need to move large volumes of unstructured data to readily accessible shared storage, like HDFS (Hadoop distributed file system) or Amazon S3, before analysis.”

Here are the highlights:

Most of the data gathered for analysis is not generated where it can be analyzed

Unstructured data such as video or sensor data is typically generated at distributed points that can be very far away from each other, and even further away from accessible shared storage. Since a single minute of HD video recorded at 50Mbps cab reach half a GB, gathering extended footage from multiple cameras very quickly breaches the file transfer capacities of common methods such as FTP, even under very high bandwidth conditions.

Borrowing next-generation solutions developed for Media & Entertainment

Many professionals outside of digital film production and post-production are unaware that a solution to transferring those huge unstructured data sets has been around for over a decade. The core technology that’s been used within Hollywood enterprises for person-to-person transfers of very large uncompressed footage has recently been incorporated into SaaS solutions that connect with cloud storage.

“It wasn’t until the cloud revolution and the development of SaaS/PaaS (software-as-a-service/platform-as-a-service) solutions that this technology became relevant and accessible to everyone else,” said Hamilton.

As more and more companies look to utilize cloud-based big data analytics like Hadoop or any other cloud application that requires the movement of large data volumes, we may see the spread of technologies that helped make feature films and episodic television what they are today. Which is kind of interesting, no?

If you haven’t already, check out the full article by Ian Hamilton on Data Informed.

Suggested Content

The Best FTP Alternative for Secure Enterprise File Transfers 

Seeking a secure FTP alternative? Discover modern, encrypted fast file transfer protocols to protect your data from hackers. Explore...

The Signiant Platform enables easy content access from within an Adobe Panel

Enhance Adobe workflows with Signiant Media Engine. Seamlessly search, preview, and import assets directly into Premiere Pro, Photoshop, and...
Chain of Custody tag with blue background

Metadata Everywhere: Chain of Custody

In this final piece of Signiant’s Metadata Everywhere series, we look at chain of custody. Chain of custody provides...