Recognizing that their businesses depend upon the ability to easily get data into the cloud, the major public cloud providers offer a range of solutions for ingesting data to their platforms. There is increasing awareness that there is no one-size-fits-all approach, and customers are demanding a choice of offerings that are optimized for different use cases. Here are some of the most common:
One solution offered by cloud platforms is the option of purchasing dedicated connectivity from the customer’s data center to one of the provider’s global connection locations.
Amazon’s AWS Direct Connect and Microsoft’s Azure ExpressRoute provide the interconnect point necessary to connect using independently provisioned dedicated circuits and IP-layer connectivity.
Such solutions promise to provide greater security, more reliability, lower latency, and higher speeds than moving files over the public Internet, and can be a compelling and cost-effective solution for certain situations.
Dedicated connectivity is a good choice when all of the data is located in a single data center and destined for a fixed cloud location, and when constant data transfer over a long period of time is planned.
On the downside, this dedicated connectivity is inelastic and very expensive when not fully utilized — customers pay for provisioned interconnect bandwidth whether they use it or not. In addition to connectivity costs paid to the cloud provider, it typically requires capital investment for terminal equipment and engagement of a local loop network provider to connect the data center to the cloud provider’s Point of Presence (POP).
This takes time to put in place and requires ongoing management, and depending on the distances involved there can still be a problem with latency.
Dedicated connectivity is also problematic for customers who are concerned about cloud platform lock-in. Such solutions are inflexible since they provide connectivity only to that provider’s cloud, meaning that switching or adding cloud vendors adds another significant layer of cost and complexity.
Another approach to uploading large files to object storage is to use a utility that will ‘chunk’ or ‘slice’ large files into multiple parts, send the various parts in parallel, and then reassemble them at the cloud destination.
Amazon offers a utility that provides this capability, which may be able to provide some improvement in upload speed — but this approach does not fully address the intrinsic problem of TCP performance over high-latency networks By aggregating the throughput of multiple TCP streams, it can only linearly increase the minimum threshold over which latency and loss impact throughput.
In practice, customers find it difficult to build robust solutions with this utility that work reliably under the load of large data sets being transferred across the public Internet. For deployments at scale, a simple multi-part upload implementation is unlikely to provide acceptable speed and reliability, and engineering investment will be required to build out a resilient implementation.
A fully productized solution such as Flight, which utilizes multi part upload over the low latency local area network between the transfer server and object storage, is a faster, more secure, more robust choice.
Both AWS and Azure offer yet another approach to the same challenge through their respective AWS Import/Export Disk and Azure Import/Export services. Here the data is transferred between the customer’s on-premises storage and cloud storage through the use of physical hard drives shipped between the customer and the cloud provider.
A variation on this approach is Amazon’s new AWS Import/Export Snowball service. With Snowball the customer orders a physical storage appliance from Amazon, connects it to their local network, and loads it with up to 50 TB of data. The appliance is then shipped back to Amazon where the data is migrated to the customer’s S3 object storage tenancy.
This appliance mechanism can be highly efficient and economical for one-time or periodically recurring bulk cloud uploads, but the cost/GByte advantage can disappear if the appliance is not completely filled — and the request/deliver/copy/return/upload sequence inherently takes time.
This solution, as well as the simple disk-based import/export options, is optimized for data migration projects rather than use cases where there is an ongoing business need to move large data sets into the cloud.
For some data migration projects, it makes sense to use Flight in conjunction with physical media. Once the data has been initially transferred to the cloud via physical media, Flight can be used to update data that has changed since the original copy was created.
Other forms of UDP acceleration technology are available on the market. Both commercially and Open Source Commercial on-premises UDP software can be deployed on VMs in the cloud, but this does not produce a robust result and requires significant upfront and ongoing resource investment.
While developers may be enthusiastic about trying to implement an Open Source protocol such as UDT, this seldom makes business sense. An Open Source protocol is just that: a simple bare-bones transport mechanism without support or regular updates.
Building and maintaining a complete solution, with security, reliability, and the ability to deal with a wide range of network conditions, is a task best left to professional software companies.
The first three ingest solutions described above have many legitimate uses, and as organizations transition to the cloud they will learn where each one fits best and employ it accordingly.
Likewise Flight does not fit every situation, but it is the best solution for many common use cases — and it stands alone as a turnkey, enterprise-class, cloud-native SaaS solution for cloud ingest.