Machine-generated data

Machine-generated data is information automatically generated by a computer process, application, or other mechanism without the active intervention of a human. While the term dates back over fifty years,^[1] there is some current indecision as to the scope of the term. Monash Research's Curt Monash defines it as "data that was produced entirely by machines OR data that is more about observing humans than recording their choices."^[2] Meanwhile, Daniel Abadi, CS Professor at Yale, proposes a narrower definition, "Machine-generated data is data that is generated as a result of a decision of an independent computational agent or a measurement of an event that is not caused by a human action."^[3] Regardless of definition differences, both exclude data manually entered by a person.^[4] Machine-generated data crosses all industry sectors. Often and increasingly, humans are unaware their actions are generating the data.^[5]

Relevance

Machine-generated data has no single form; rather, the type, format, metadata, and frequency respond to some particular business purpose. Machines often create it on a defined time schedule or in response to a state change, action, transaction, or other event. Since the event is historical, the data is not prone to be updated or modified. Partly because of this quality, the U.S. court systems consider machine-generated data as highly reliable.^[6]

Machine-generated data is the lifeblood of the Internet of Things (IoT).^[7]

Growth

In 2009, Gartner published that data will grow by 650% over the following five years.^[8] Most of the growth in data is the byproduct of machine-generated data.^[4] IDC estimated that in 2020, there will be 26 times more connected things than people.^[9] Wikibon issued a forecast of $514 billion to be spent on the Industrial Internet in 2020.^[10]

Processing

Given the fairly static yet voluminous nature of machine-generated data, data owners rely on highly scalable tools to process and analyze the resulting dataset. Almost all machine-generated data is unstructured but then derived into a common structure.^[4] Typically, these derived structures contain many data points/columns. With these data points, the challenge lies mostly with analyzing the data. Given high performance requirements along with large data sizes, traditional database indexing and partitioning limits the size and history of the dataset for processing. Alternative approaches exist with columnar databases as only particular "columns" of the dataset would be accessed during particular analysis.

Examples

Web server logs^[11]
Call detail records^[11]
Financial instrument trades^[11]
Network event logs^[11]
SEIM logs
Telemetry collected by the government^[11]

Notes

Reference List

↑ Chorafas, Dimitris N., Control Systems Functions and Programming Approaches: Applications, 1966, https://books.google.com/books?id=D1qDu4nTmvsC
↑ Monash, 12/30/2010
↑ Abadi
1 2 3 Monash, Three Broad Categories of Data
↑ Deloach, Machine Generated Data
↑ Federal Evidence Review, Machine Generated Data was Not Statement and Raised no Hearsay
↑ https://twitter.com/SethGrimes/status/707286159135784960
↑ ScienceLogic
↑ , Chuck's Blog
↑ , Wikibon
1 2 3 4 5 Monash, Examples of Machine Generated Data

Bibliography

Abadi, Daniel. "Machine vs. Human generated data". BlogSpot.
Deloach, Don. "Machine Generated Data". Infobright, Inc.
Federal Evidence Review. "Machine Generated Data Was Not Statement and Raised no Hearsay or Confrontation".
Monash, Curt. "Three Broad Categories of Data". Monash Research.
Monash, Curt. "Examples of Machine Generated Data". Monash Research.
Monash, Curt. "Examples and definition of machine-generated data". Monash Research.
Science Logic. "Gartner Ten Technologies to Watch".

This article is issued from Wikipedia - version of the 5/10/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.