Internet of Things: Live Data Quality Assessment for a Sensor Network
TL;DR: We believe that connected devices and real-time data analytics are the next big things in GIS. Here is a live dashboard for a sensor network in 7 cities around the world.
Geoinformation systems have evolved quite rapidly in recent years and the future seems to be more exciting than ever: All major IT trends such as Internet of Things (IoT), big data or real-time systems are directly related to our professional domain: Smart devices are spatially located or even moving in time; big data and real-time systems almost always need locational analytics. This is why we got interested when we heard about the „Sense Your City Art Challenge„, a competition to make sense of a network of DIY sensors, which are spread over 7 cities in 3 continents. To be honest, our interest was not so much drawn to the „art“ aspect, at the end we are engineers and feel more at home with data and technology. And there is real-time sensor data available within the challenge: About 14 sensor nodes in every city deliver approximately 5 measurements every 10 second, such as temperature, humidity or air quality. The sensor is data freely available. When we looked at the numbers, we realized that data had some surprising properties, for example the temperature within varies quite a bit within one city.
Our goal: Live data quality assessment for a sensor network
So, we took the challenge a bit differently and more from an engineering perspective: How to implement a real-time quality assessment system for sensor data? As an example, we took the following questions, which need re-evaluated as new sensor data comes in:
- Are there enough sensors that provide information about the sensors?
- How much does the sensor measurements vary within a city?
- How do the sensor measurements compare to external data?
Our solution: A live dashboard with real-time statistics
My colleague Patrick Giedemann and I started late last week and developed a live dashboard with real-time statistics for the sensor network of seven cities. The dashboard is implemented with a story map containing one world view and seven views on city-level. The components of the views are:
- A heatmap showing a condensed view of the analysis for each of the cities, labeled with numbers 2 to 8. For example, we want to show the visualize number of sensor values for each city within a time frame of 30 seconds. The darker the blue bucket, the more sensor signals we got. Light buckets indicate a low number og signals in the time frame.
- Another heatmap, which calculates coefficient of variation for each city, again with a time frame of 30 seconds.
- A gauge showing the number of sensor signals for a city and a linechart with the minimum, maximum and average temperature for a city.
We haven’t yet got around to show real weather data, although it is processed internally.
Some implementation details
For the technically inclined: Our implementation is based on Microsoft’s Azure, one of many cloud computing platforms available. Specifically, we used three main components: Event Hubs, Stream Analytics and WebSockets.
- We started building our solution using Azure Event Hubs, a highly scalable publish-subscribe infrastructue. It could take in millions of events per second, so we have room to grow with only 170’000 data points per hour. Every ten seconds, we pull the raw data from the official data sources and push the resulting data stream to an Azure Event Hub.
- For the real-time analysis, we tried Azure Stream Analytics, a fully managed stream processing solution, which can take event hubs as an input source. With Stream Analytics, you can analyze incoming data within a certain time window and immediately push the result back to another event hub. For our example, Stream Analytics aggregates the raw signal data every 3 to 4 seconds and calculates average value, minimum value, maximum value and standard deviation for 30 seconds within a city.
- Finally, there is a server component, which transforms the event hub into WebSockets. With WebSockets, we can establish a direct connection between the data stream and a (modern) browser client.
Admittedly, this is a very early version of a live quality assessment system for real-time sensor data. However, it shows the potential: We can define a set of data quality indicators like number of active sensors or the variation coefficient. These indicators can be computed as the data streams into the system. Using Azure Stream Analytics, we could incorporate tens of thousands of sensors, instead of only hundred and we’d still have the same performance without changing a line of code.
Of course, there is room for improvements:
- Ideally the sensor would push its data directly into the Azure EventHub instead of using a polling service as intermediate.
- Exploiting historical data, like a comparison between the live data and date from a week ago.
- Integrating more and different data sources for the data analysis.
Do you have any question? Send me an e-mail at firstname.lastname@example.org or leave a comment.