In my role at Ciber, I have had the opportunity to talk with industry leaders who are making a difference in helping companies understand how to gain business value from their data. In the area of Big Data, one of those thought leaders is Dr. Tom Bradicich, Fellow at National Instruments, a Ciber strategic alliance partner. Next week, at NIWeek, National Instruments’ global conference in Austin, Texas, we will be talking with other NI partners and attendees about how they can unlock the business value of Big Analog Data.
We have invited Tom to be our guest blogger this week and provide his views on this subject.
|Tom Bradicich, PhD
Fellow and Corporate Officer
linkedin.com/in/tombradicichphd | tombradicichphd.tumblr.com | twitter.com/tombradicichphd
For my job at National Instruments, I travel the world and see firsthand how engineers and scientists are acquiring vast amounts of data at very high speeds and in a variety of forms. I’ve seen how tens of terabytes can be created in just a few seconds of physics experiments. And similar amounts in hours, by taking measurements of jet engines or testing a turbine used for electric power generation. Immediately after this data acquisition, a big data – or “Big Analog Data™” – problem exists. From my background in the IT industry with IBM, it’s clear to me that advanced tools and techniques are required for data transfer, management, and analytics, as well as systems management for the many data acquisition and automated test systems nodes.
I call this the “Big Analog Data™” challenge simply because it’s both big and “analog”. That is, the sources of this data include physical phenomena generated by nature or machines. For example, light, sound, temperature, voltage, radio signals, moisture, vibration, velocity, wind, motion, magnetism, particulates, acceleration, current, pressure, time, location, etc., illustrated in the figure. When testing a smart phone at the end of a manufacturing line, there are many analog phenomena to measure, such as sound, three radios (cell, Bluetooth, and Wi-Fi), vibration, light, touch, video, orientation, and location.
Related to this challenge is data archive management. I’ve spoken with engineers who commonly tolerate a type 2 error – keeping worthless data – in order to avoid committing a type 1 error – discarding valuable data. Advances in real time data analytics will help bifurcate the data accordingly, however, I’m not sure if these advances will cause us to discard more data or to keep more data.
Characterizing Big Analog Data™
In general, Big Analog Data™ is a form big data, which is commonly found in the literature to be characterized by a combination of three or four “V’s” – Volume, Variety, Velocity, and Value. In addition, another “V” of big data I’m seeing is “Visibility”. This describes globally dispersed enterprises needing access to the data in multiple locations, to both do analytics and see results.
Big Analog Data™ is distinguished from all other big data in three fundamental ways. First, it’s “older”, in that many Big Analog Data™ sources are generated from natural analog phenomenon such as light, motion, and magnetism, etc. These natural sources have been around since the beginning of the universe. I think…
Second, it’s “faster” since some analog time-series signals require digitizing at rates as fast as tens of gigahertz, and at much wider bit widths than other big data. And third, it’s “bigger” because Big Analog Data™ information is constantly being generated from both nature and electrical and mechanical machinery. Consider the unceasing light, motion, and electromagnetic waves all around us right now.
The Value of Big Analog Data™
When selling to non-technical businesses such as retail or travel, the sales proposition for big data is fundamentally two phases. Phase one is, “you should acquire lots more data because there’s great value in it”, and phase two is, “you should buy my hardware and software to extract the value”. However, in my business – test, measurement, and control – phase one is usually skipped because engineers and scientists inherently understand statistical significance. That is, it’s intuitive that small data sets can limit the accuracy of conclusions and predictions.
With advanced acquisition techniques and big data analytics, new insights can be derived that have never before been seen. For example, I’m working with companies seeking greater visibility into test and asset monitoring data. We’re helping them indentify emerging quality trends or predict machine failures. With rotating machinery, converting an unplanned surprise outage to a planned maintenance outage has great value. In scientific labs, we work to accelerate discovery with high speed, highly accurate measurements in their experimentation.
The 3 Tier Big Analog Data™ Solution
To enjoy these benefits, end-to-end solutions are needed for maximum insight in the most economical way. I’ve seen cases where many devices are under test, and many distributed automated test system nodes are needed. Since these test systems are effectively computer systems with software images and drivers, the need arises for remote network-based systems management tools to automate their configurations, maintenance, and upgrades. Growing data volume forces global companies to ensure data access by many more engineers and data scientists than in the past. This requires network gear and data management affording multiuser access to geographically distributed data. I’m seeing the cloud gaining favor for both data access and scalability of simulations and analytics.
Big Analog Data™ solutions are portioned into a three-tier architecture as shown in the figure. These tiers come together to create a single, integrated solution adding insight from the real time point of data capture (sensors) to analytics in the back-end IT infrastructures. Data flows across “The Edge”, which is the point where data acquisition and test system nodes connect to traditional IT equipment. Data then hits a network switch in the IT Infrastructure tier, where servers, storage, and networking manage, organize, further analyze, and archive the data.
It’s interesting to note that in the IT industry, the point at which data first hits a server is referred to as “real time”. However in my world – the test and measurement industry – by the time data flows through the middle tier over The Edge and hits a server, its quite aged. That said, the spectrum of value spans the entire five phases of the data flow (see above Figure), from real time to archived. Real time analytics are needed to determine the immediate response of a control system and adjust accordingly, such as in military applications or precision robotics. At the other end, archived data can be retrieved for comparative analysis against newer in-motion data, to gain insight into the seasonal behavior of an electrical power generating turbine.
Significant in-motion and early life analytics and visualization takes place in the solution’s middle tier, via National Instruments CompactRIO, NI CompactDAQ and PXI systems and software such as LabVIEW, DIAdem and DataFinder. Through my experience with end-to-end solutions, I know the value of skilled systems integrators such as Ciber. A strong focus on the interplay among the solution tiers greatly lessens the deployment and integration risk, and reduces the time-to-value.
Well, enough about me and what I think; reply below to let me know your Big Analog Data™ thoughts.