Click here to read the newsletter. Here’s the Director’s Letter from Reagan Waskom:
Big promises are being made for big data. The tsunami of data resulting from new technologies has created some headaches, but also intriguing opportunities. Satellite images, wireless sensor networks, and model output
all produce data that must be processed and analyzed to create useful and reliable information. Data does not enhance our understanding or management decisions. Rather, it must be transformed into information that is accurate and reliable to become truly useful.
Data acquisition capacity has grown to the extent that a new branch of information sciences has emerged, known as big data. Big data has been called a “fad” in scienti c research, but it is more accurate to call it a “hot topic”, as we know the cascade of data from expanding new technologies will only continue. Numerous scienti c conferences and papers on the topic of big data have occurred since the Obama Administration launched the Big Data Research and Development Initiative in 2012 to “greatly improve tools and techniques needed to access, organize, and glean discoveries from huge volumes of digital data.”
Big data has been described as high volume, high velocity, and/or high variety information in excess of one terabyte that is too large for a single machine to handle and that traditional techniques are insuf cient to analyze. This de nition is fluid and may soon be described in petabytes, but it also includes the velocity at which the data is acquired from multiple independent data sources. Thus, cloud-linked servers are typically needed to adequately store and process the data. Real-time acquisition and processing that enables trend detection and improved decision-making is the goal of businesses and government agencies seeking to exploit big data. In other cases, the goal may be to enable public access to useful, interesting, or important information.
A number of questions must be resolved as we develop new data technologies and capacity. For example, who owns big data when it is crowd-sourced or provided by multiple public and private entities? How does the information remain secure and individual privacy protected? From a scienti c perspective, what about data quality and veracity? How do we avoid sampling bias and misinterpretation? Again, data itself is not the goal, but the information gleaned from the data can enhance our understanding of trends, processes, demographics, etc.
Water data collected from multiple public water systems (such as used in past Statewide Water Supply Investigations conducted by the CWCB) is an example of using big data to determine statistical patterns that suggest significant correlations and trends in water use and conservation, forecasting future demands, and to optimize coordination of resources. Water managers with multiple sources of water supply could also bene t from better data- driven forecasting and real-time operations. Sensor technologies have arrived on the market to help water utilities survey underground pipes and detect leaks. Smart meters could help managers and individual users ne-tune their system. In terms of academic research, both the NSF funded NEON and CUASHI networks described in this newsletter have been organized to provide big and open data to researchers. NEON represents the largest single investment in ecological research data ever made. This “research infrastructure” is transforming our ability to advance data visualization and statistical methods to understand patterns, processes, and detect outliers.
The value of big data is the opportunity to answer big questions. What is also exciting about big and open data is the potential for innovations that can improve our decision- making capacity. This issue of the Colorado Water newsletter provides examples of how big data for water can be accessed and used. The data tsunami keeps coming at us—the power of that data to help solve big water challenges is ours to capture.