Data


Data (US: /ˈdætə/; UK: /ˈdtə/) are individual facts, statistics, or items of information, often numeric.[1] In a more technical sense, data are a set of values of qualitative or quantitative variables about one or more persons or objects,[1] while a datum (singular of data) is a single value of a single variable.[2]

Although the terms "data" and "information" are often used interchangeably, this term has distinct meanings. In some popular publications, data are sometimes said to be transformed into information when they are viewed in context or in post-analysis.[3] However, in academic treatments of the subject data are simply units of information. Data are used in scientific research, businesses management (e.g., sales data, revenue, profits, stock price), finance, governance (e.g., crime rates, unemployment rates, literacy rates), and in virtually every other form of human organizational activity (e.g., censuses of the number of homeless peopleby non-profit organizations).

In general, data are atoms of decision making: they are the smallest units of factual information that can be used as a basis for reasoning, discussion, or calculation. Data can range from abstract ideas to concrete measurements, even statistics. Data are measured, collected, reported, and analyzed, and used to create data visualizations such as graphs, tables or images. Data as a general concept refers to the fact that some existing information or knowledge is represented or coded in some form suitable for better usage or processing. Raw data ("unprocessed data") is a collection of numbers or characters before it has been "cleaned" and corrected by researchers. Raw data needs to be corrected to remove outliers or obvious instrument or data entry errors (e.g., a thermometer reading from an outdoor Arctic location recording a tropical temperature). Data processing commonly occurs by stages, and the "processed data" from one stage may be considered the "raw data" of the next stage. Field data is raw data that is collected in an uncontrolled "in situ" environment.Experimental data is data that is generated within the context of a scientific investigation by observation and recording.

The first English use of the word "data" is from the 1640s. The word "data" was first used to mean "transmissible and storable computer information" in 1946. The expression "data processing" was first used in 1954.[6]

The Latin word data is the plural of 'datum', "(thing) given," neuter past participle of dare "to give".[6] In English the word data may be used as a plural noun in this sense, with some writers—usually, those working in natural sciences, life sciences, and social sciences—using datum in the singular and data for plural, especially in the 20th century and in many cases also the 21st (for example, APA style as of the 7th edition still requires "data" to be plural.[7]). However, in everyday language and much of the usage of software development and computer science, "data" is most commonly used in the singular as a mass noun (like "sand" or "rain"). The term big data takes the singular.

Data, information, knowledge, and wisdom are closely related concepts, but each has its role concerning the other, and each term has its meaning. According to a common view, data are collected and analyzed; data only becomes information suitable for making decisions once it has been analyzed in some fashion.[8] One can say that the extent to which a set of data is informative to someone depends on the extent to which it is unexpected by that person. The amount of information contained in a data stream may be characterized by its Shannon entropy.


Some of the different types of data.
Adrien Auzout's "A TABLE of the Apertures of Object-Glasses" from a 1665 article in Philosophical Transactions