Signiant introduces Metadata Everywhere, a new series about our innovative Software-Defined Content Exchange (SDCX) architecture and how it facilitates interactions with both media essence and metadata across disparate and distributed storage environments.
In the first part, Metadata Everywhere: How Signiant’s SDCX SaaS Platform Works with Metadata Across Distributed Content Repositories, Signiant provided an overview of 10 ways the Signiant platform interacts with metadata. This installment details the role of metadata in Signiant’s patented intelligent transport.
Data networks have always been complex, but in recent years network management and optimization have become a growing challenge for media companies. Driven by a hybrid cloud, multi-cloud imperative, and workflows that are far more global and interconnected, there are more variables in play than ever. Dynamic network conditions, growing file sizes, distributed teams and increasingly global business partnerships, along with constant change make manual performance tuning a never-ending and near impossible task. While there’s plenty of opportunity to gain efficiencies from new technology, bigger pipes and the cloud, optimizing network performance is a battle — and one that needs fighting to meet the tight deadlines that define the modern media supply chain.
Signiant’s patented intelligent transport architecture was created for exactly that purpose — to provide fast, seamless, and secure access to media assets across complex network environments. Our Software-Defined Content Exchange (SDCX) SaaS platform aggregates metadata from billions of file transfers between more than 50,000 companies across almost every IP network scenario imaginable. The platform collects and anonymizes metadata about the assets themselves, the system resources available, and current and historical network conditions to inform the intelligent transport architecture. The architecture then uses machine learning to automatically adapt to continuously changing conditions offering the best possible performance on any IP network.
Let’s dig in to explore how it all works:
In order to optimize transport speed, there are a series of settings that can have a tremendous impact on performance. However, it’s difficult and time-consuming to continually monitor and manually adjust these settings. Not only do the optimal settings vary widely based on operating environments, but they also are each interdependent on one another. With Signiant’s intelligent transport architecture, there are three main settings in play: TCP vs Signiant UDP; the amount of parallelism; and use of pipelining, which we explain in more detail in the following sections.
TCP vs Signiant UDP
A primary decision is whether to use standard TCP or Signiant’s proprietary UDP protocol for the lowest level transfer reliability and flow control mechanism. Various factors such as the distance between sender and delivery endpoints, the amount of network sharing or congestion, and how much control one has to manage the network itself impact this decision. The other transfer settings also influence this choice, highlighting the complex interdependence between them. For example: TCP makes sense for a transfer between two servers in the same rack in a data center whereas Signiant UDP would most likely be the choice for a transfer between Buenos Aires and Singapore over the public internet. But what about every scenario in between?
In every case, Signiant’s transport technology will ensure that all data, byte for byte, is moved successfully, reliably and in the correct order, while at the same time, choosing the settings that optimize performance.
Amount of parallelism
The next setting controls parallelism. This is how many independent, simultaneous streams transport the asset(s) in question. Data streams, like lanes on a highway, depend both on the number and the size of the vehicles speeding to their destinations. But the content being moved isn’t the only factor that impacts the number of streams. The compute resources and the storage type at each endpoint play a major role in choosing the number of parallel streams used. For example, if machines at both ends have sixteen cores and are reading and writing to object storage, a high degree of parallelism makes sense, but a single core machine writing to direct-attached storage with a single spinning disk would use low parallelism.
Pipelining refers to how the file metadata being transferred is encoded, including demarcation of the files, file names, permissions, creation date, accessed date, and other file attributes. Much like parallelism, pipelining configuration depends both on the makeup of the dataset and the storage types in play. If the files are small enough, this metadata can amount to more data than the content of the files; and how that’s transmitted through the network is critical. Also, with older storage technologies (file systems and spinning disks), the time to open a file can be substantial, and a lot of files must be opened in parallel. Further, protocols like FTP only transmit one file over each TCP connection, so TCP connection setup time becomes a big issue. Signiant uses protocols allowing multiple files to be sent over a single connection, both with TCP or Signiant UDP.
The TCP vs. Signiant UDP decision, the amount of parallelism, and pipelining settings all directly impact transport performance, but the optimal configuration depends on a variety of inputs that change constantly, with each setting influencing the others. With the number of possible combinations and the flux of situations, it’s simply not practical to manage this manually.
So, what metadata inputs does Signiant’s intelligent transport architecture use to optimize performance and select the proper settings?
Signiant’s machine learning model utilizes current and historical anonymized information about operating conditions to make intelligent decisions on the transport settings to use. Every time the platform accesses a media asset, information about the operating environment, the settings used, and the results achieved are captured by our SaaS service and used to inform future transfer decisions.
A key advantage of Signiant’s SaaS platform is the massive pool of data informing the system, which each business on the platform benefits from. The more businesses connected, the more Signiant learns about the wide variety of possible conditions a business may face. With more than 50,000 businesses currently connected, there’s a wealth of data to draw from, and it’s unlikely that the platform hasn’t already seen situations similar to the conditions in play.
Type of Content
The first kind of operating environment metadata comes from the content being moved. Signiant’s next generation transport handles both stream and file-based data. For stream-based content, the maximum or target bit rate of the content is a critical piece of metadata that informs how fast a transmission needs to be. If the stream is already broken into chunks, that informs decisions about parallelism and pipelining. For file-based content, both the size of files and the number of files impact the ratio of file content to file metadata. This type of content combined with the type of storage often informs the optimal settings for transport as well.
Type of Storage
In an increasingly hybrid cloud, multi-cloud world, organizations often deploy complex, multi-faceted storage environments. Information about storage on either end of a transfer is another set of environmental metadata impacting performance and informing how the platform optimizes transport.
These all affect the time it takes to start reading and writing from storage (seek and open time) and how fast data can actually be read from or written to storage (data transfer rate) and how well parallel access to storage is supported.
Bandwidth and Network Conditions
Some of the most critical environmental variables are, of course, related to network conditions — available bandwidth, the distance data is travelling, and how much the network can be managed. A private network may offer Quality of Service (QoS) controls whereas a public internet connection is subject to whatever else may be happening at the edge, core, or peering points at any given moment. Technically speaking, distance and congestion impact latency, jitter, and packet loss and inform when to use TCP vs Signiant UDP, as well as parallelism and pipelining decisions.
Another critical piece of operating environment metadata Signiant considers is the computing resources at each transfer endpoint. The amount of memory, the CPU speed, and the number of CPUs at the disposal of both source and target all impact how we tune. This information influences all three of the key settings — pipelining, parallelism, and reliability mechanism.
Signiant has a unique lens into the challenges of today’s media supply chain. Our SDCX SaaS platform connects businesses of all sizes, all over the world. Every day, organizations access millions of media assets via our platform, which can handle any-size file, datasets with millions of smaller files, and live streams. Because of this, we see every type of private and public IP network with a wide variety of available bandwidth and constantly changing conditions. Each of these elements impacts performance, which makes optimizing content exchange a constant battle. By combining our advanced transport technology with the power of machine learning applied to the vast amount of metadata aggregated across our platform, Signiant’s patented intelligent transport architecture fights that battle, so you don’t have to.