Jenseits von Standard-Darstellungen

Vielleicht haben Sie den Beitrag im Tagesanzeiger-Datenblog oder andernorts in der Presse bzw. online gesehen: Ich habe in meiner Funktion als Visiting Researcher mit Kollegen am Oxford Internet Institute kürzlich zwei Karten zum globalen Internetzugang publiziert:

Die Internet-Bevölkerung 2013 (Geonet Project, Oxford Internet Institute)
Die Internet-Bevölkerung 2013 (Geonet Project, Oxford Internet Institute)
Relativer Zuwachs der Bevölkerung mit Internet-Zugriff, 2009–2013 (Geonet Project, Oxford Internet Institute)

Mehr zum Thema der Karten können Sie in den folgenden zwei Blogbeiträgen (in Englisch) oder beim Tagesanzeiger lesen:

Hier möchte ich etwas mehr auf die Art der Visualisierung eingehen. Die Karten oben verzerren die Fläche jedes Landes so, dass sie proportional zur Anzahl Personen mit Internetzugang wird. Die Karten enthalten im südlichen Atlantik eine kleine unverzerrte Referenzkarte, welche es noch deutlicher macht, wie sehr zum Beispiel Europa online über- und Afrika unterrepräsentiert ist.

Es handelt es sich bei diesem Kartentyp um sogenannte Kartenanamorphosen oder -anamorphoten bzw. – einfacher in Englisch – um Cartograms. In diesem Fall sind es hexagonal cartograms, das heisst jedes Land ist aus einer Zahl kleiner Hexagone als Bausteine zusammengesetzt. Ich habe das Vorgehen hierzu schon andernorts detailliert erläutert, mittlerweile habe ich aber noch einige Optimierungen des Vorgehens umgesetzt. Um die Verzerrungen der Formen der Länder möglichst gering zu halten, ist einiges an Handarbeit erforderlich, die einem auch die gängigen Kartogramm-Algorithmen nicht abnehmen können.

Viele Personen fühlen sich von solchen Darstellungen unmittelbar angesprochen. Sie können bisweilen ziemlich plakativ wirken, etwa wenn die zugrundeliegende Grösse sehr ungleich verteilt ist. Ein Nachteil von solchen Darstellungen ist sicherlich, dass die Einschätzung von relativen Grössen keine genaue Einordnung erlaubt. Beispielsweise wäre zwar nicht unmöglich, aber sehr schwierig zu sagen, wieviele Menschen in Japan online sind oder – in dieser Darstellung der Schweizer Kantone und Städte – wieviele Personen im Kanton Graubünden leben. In beiden Fällen ist dies aber auch ganz klar nicht eines meiner Ziele.

Die automatisch berechenbaren Verzerrungen der Länder in den Darstellungen oben habe ich mit einer spezialisierten Software erstellt. Für die Umlegung auf die Hexagone, die Minimierung der Verzerrungen, das manuelle (!) Setzen der Labels und das gesamte endgültige Layout habe ich ArcGIS verwendet. Die fertigen Karten, so finde ich, sehen aber nicht mehr nach Standard-„GIS-Karten“ aus.

Mit diesem Post möchte ich dazu aufrufen, (geo)graphische Konventionen und ‚best practices‘ auch mal zu hinterfragen. Vielleicht lässt sich etwas besser als mit einer Choroplethen- oder Symbolkarte, womöglich noch mit einer Standardlegende, visualisieren. „Besser“ kann hier etwa heissen,

  • dass ein Fakt exakter kommuniziert werden kann oder
  • dass eine Darstellung Interesse weckt beim Publikum oder
  • dass eine Darstellung zu weiteren Fragen anregt oder
  • dass eine Darstellung ausgewählte Punkte vielleicht nicht exakt aber in ihren grossen Linien aufsehenerregend vermittelt.

Vielleicht möchten Sie ihre Darstellungen auch vom langweiligen Standard-Look befreien? –
Gerne berate ich Sie dabei.

 

Artikel: Die Zukunft von GIS ist smart und vernetzt

Vor kurzem hat in unserem Blog Ivo Leiss über die Evolution von GIS und GIS 5.0 berichtet. In GIS 5.0 werden das Internet der Dinge, Indoor-Navigation und Echtzeit-Informationssysteme Schlüsselelemente sein. GI-Systeme 5.0 werden smarter und vernetzter sein als die Systeme, die wir heute kennen. Dies zieht auch neue Infrastruktur-Bedürfnisse nach sich. Wie die einzelnen Elemente zusammenspielen, hat Ivo Leiss im oben verlinkten Blogpost anhand dieser Grafik erläutert (nach Porter und Heppelmann, abgeändert):

Wenn Sie sich wie wir für die Zukunft von Informationssystemen interessieren, empfehle ich Ihnen die Lektüre unseres Geomatik Schweiz-Artikels mit dem Titel „GIS 5.0 – Smart und vernetzt“ (pdf). Darin hat Ivo Leiss drei unserer Mitarbeitenden (u.a. mich) zum Thema interviewt. Die drei Interviewten nähern sich dem Thema aus drei unterschiedlichen Richtungen: Internet of Things, Erkenntnisgewinne aus Datenanalysen und Cloud-Technologien.

GIS 5.0 – Smart und vernetzt (pdf), erschienen in Geomatik Schweiz 5/2015.

GIS 5.0 – Smart and connected

Recently I came across an interesting article by Dave Peters. He outlines the evolution of GIS in four development phases:

  1. In the early 80ies GIS were based primarily on scripts. Using scripts, GI specialists cleaned, edited and visualized spatial data. Some readers might recall the ARC/INFO era and its scripting language Arc Macro Language – AML.
  2. About 20 years later, at the end of the 90ies, the first GUI-centric object-oriented GIS appeared on the stage (for example, ArcGIS Desktop in 1998). This second step with the more efficient programming technique was enabled by more performant hardware.
  3. New technologies to provide data and services emerged with the rapid advent and development of the Web. A building stone of these service-oriented architectures (SOAs) was, for example, the Web Map Services (WMS) specification that was adopted in 2000 (Version 1.0).
  4. Finally, virtualization of hardware and centralization of computing centers initiated the fourth phase leading to cloud-based GIS portals. Storage space and computing power have become scalable commodities. ArcGIS Online, launched in 2012, is a prominent example of this fourth phase.

Now the question is: what comes next?

The steps in GIS software evolution. What's next?
The steps in GIS software evolution. What’s next?

Smart and connected systems

From the past we can learn: New technological abilities lead to new applications. They substiantially influence the further evolution of GIS. Among the contenders for the most relevant (to GIS) technologies and developments I see:

  • indoor navigation,
  • the Internet of Things (IoT) and
  • real-time sytems

Future GIS applications will be more and more smart and networked. They will require a technical infrastructure which is composed of several layers: embedded components, network communications, a cloud-based platform or system, tools for providing authentification and authorization, and  gateways to include external data sources as well as in-house data (see the figure below, adapted from Porter and Heppelmann).

The architecture of future smart, networked GIS applications
The architecture of future smart, connected GIS applications (adapted from Porter and Heppelmann)

The IT Division of Ernst Basler + Partner (EBP Informatics) has already amassed solid experience with the components in such a system (see our reference projects). Also in our blog posts we engage with these future developments, most recently with regards to the real-time quality assessment of data streams.

Do you have any questions or comments on these topics? We would like to hear from you!

 

GIS 5.0 – Smart und vernetzt

Vor Kurzem bin ich auf einen interessanten Artikel von Dave Peters gestossen. Er hat die Evolution von GIS-Software in vier Entwicklungsschritten dargestellt:

  1. In den frühen 80er-Jahren basierten GIS primär auf Skripts. Mit ihnen wurden Daten bereinigt, editiert und visualisiert. Einige Leser dürften sich noch an die Zeit von ARC/INFO erinnern mit seiner Skriptsprache AML (Arc Macro Language).
  2. Erst Ende der 90er Jahre – also fast 20 Jahre später – kamen die ersten objekt-orientierten GIS-Produkte auf den Markt (z.B. ArcGIS Desktop im Jahr 1998). Möglich wurde dieser zweite Entwicklungsschritt mit der effizienteren Programmiertechnik durch eine performantere Hardware.
  3. Im Zug der rasanten Entwicklung des Webs entstanden anschliessend Technologien, um Daten und Services breit verfügbar zu machen. Ein Baustein dieser service-orientierten Architekturen ist beispielsweise die im Jahr 2000 (Version 1.0) verabschiedete Spezifikation des Web Map Services (WMS).
  4. Virtualisierung von Hardware und Zentralisierung von Rechenzentren leiteten den vierten Entwicklungsschritt ein und führten zu cloud-basierten GIS-Portalen. Dabei können Speicherplatz aber auch Rechenleistung den aktuellen Bedürfnissen entsprechend bezogen und wieder abgestellt werden. Das im Jahr 2012 lancierte ArcGIS Online ist ein prominentes Beispiel hierfür.

Jetzt stellt sich natürlich die Frage, was als Nächstes kommt.

abb1
Die vier Entwicklungsschritte der Evolution von GIS-Software.

 Smarte und vernetzte Systeme

Aus der Vergangenheit lernen wir: Neue technologische Möglichkeiten führen zu neuen Anwendungen und beeinflussen in hohem Masse auch die Weiterentwicklung von GIS. Zu den für die Zukunft von GIS relevanten Technologien und Entwicklungen zähle ich

  • die Indoor-Navigation,
  • das Internet der Dinge (Internet of Things, kurz IoT) sowie
  • Echtzeit-Systeme.

Künftige GIS-Anwendungen werden zunehmend smart und vernetzt. Sie erfordern eine neue technische Infrastruktur, welche sich aus verschiedenen Schichten zusammensetzt. Dazu gehören eingebettete Systeme, Netzwerkkommunikation und ein cloud-basiertes System, aber auch Werkzeuge zur Gewährleistung der Datensicherheit, ein Gateway für die Einbindung externer Informationsquellen sowie die Integration der eigenen Unternehmenssysteme (siehe Abbildung unten, abgeändert nach Porter und Heppelmann).

abb2
Die Architektur von smarten, vernetzten GIS-Anwendungen.

Der Geschäftsbereich Informatik von Ernst Basler + Partner (EBP Informatik) hat bereits in mehreren dieser Bereiche Erfahrungen gesammelt (vgl. unsere Referenzprojekte). Auch in unseren Blogposts beschäftigen wir uns immer wieder mit diesen Zukunftsthemen, wie z.B. die Beurteilung von Datenqualität bei Realtime-Sensoren.

Informationsvisualisierung: Small Multiples

Wie visualisiert man viel Information, so dass sie auf einen Blick – zumindest grob – erfassbar und vergleichbar ist? Eine Möglichkeit für nicht-räumliche Daten besteht in einer sogenannten Scatterplot Matrix (oder auf Deutsch: Streudiagramm-Matrix):

Scatterplot von Andersons Iris-Datensatz (CC-BY Wikipedia User Indon)

Eine Scatterplot Matrix zeigt ein Set von Scatterplots desselben mehrdimensionalen Datensatzes. Sie ist zum Beispiel einfach in der Software R umsetzbar, meines Wissen aber zum Beispiel in Excel nicht unterstützt. Die Attribute des visualisierten Datensatzes sind dabei in der Regel in der Diagonalen beschriftet. Manchmal enthält die Diagonale auch Histogramme bzw. Verteilungsfunktionen des jeweiligen Attributs. In einer Scatterplot Matrix kann man also alle visualisierten Attribute miteinander vergleichen und untersuchen, wie sie ko-variieren. Die Matrix bietet somit eine grosse Informationsfülle auf kleinem Raum.

Im Prinzip verkörpert eine Scatterplot Matrix das Konzept des small multiple, welches durch Edward Tufte Verbreitung gefunden hat. Ein Small Multiple ist (etwas vereinfacht und verallgemeinert gesagt) eine Ansammlung kleiner, sehr ähnlich gestalteter Darstellungen, welche unterschiedliche Sichten auf denselben Datensatz ermöglichen. Dieses Prinzip lässt sich auch sehr gut auf die Visualisierung räumlicher Informationen anwenden. Man kann so ähnlich zur Scatterplot Matrix verschiedene Attribute visualisieren oder aber auch das räumliche Muster von Kategorien darstellen.

Dieses zweite Vorhaben habe ich anhand des frei verfügbaren Baumkatasters der Stadt Zürich beispielhaft umgesetzt:

Ich habe in diesem Beispiel in allen Einzeldarstellungen dieselben Kontextinformationen eingeblendet: Topographie mit einer Reliefschattierung (erlaubt die Abschätzung der Höhe, Neigung und Exposition), Fliessgewässer, stehende Gewässer sowie die Stadtgrenze von Zürich. In Rot ist schliesslich die Verbreitung von Baumgattungen dargestellt. Eine Gattung (Genus) umfasst verschiedene Pflanzenarten, „Acer“ als Genus umfasst also verschiedene Arten von Ahorn. Die deutschsprachige Bezeichnung meint also nicht eine spezifische Art, sondern steht stets für eine Gruppe von Baumarten.

Die Kontextinformationen in der obigen Darstellung erlauben es zumindest theoretisch, gewisse Interpretationen bezüglich der Verbreitungen anzustellen („Theoretisch“ weil die Verbreitung von Bäumen in der Stadt sich natürlich auch im besten Fall nur teilweise anhand ökologischer Faktoren erklären lässt. Ebenso wichtig sind wohl Pflanz-Zeitpunkt, Modeströmungen beim Stadtgrün, sprich: die Gestaltungsfreiheit von Grün Stadt Zürich 🙂 ).

Die small multiple-Darstellung funktioniert meiner Meinung nach auch bei räumlichen Daten sehr gut. Ich verwende sie gerne, um multikategoriale Daten zu visualisieren. Stellen sie sich vor, alle Baumgattungen wären in einer Karte dargestellt, bespielsweise als verschiedenfarbige, sich deutlich überlappende Punkte! Oder etwas fortgeschrittener: die Darstellung wäre umgesetzt als ein Set bivariater oder multivariater Choroplethen-Karten (man lese hierzu den vorzüglichen Beitrag von Joshau Stevens: http://www.joshuastevens.net/cartography/make-a-bivariate-choropleth-map). Beide Darstellungsarten wären bei einer stattlichen Anzahl von Baumgattungen unglaublich schwierig zu lesen.

Die small multiple-Darstellung löst die Aufgabe hingegen effektiv und effizient. Der Fokus liegt hier natürlich nicht auf der Detailgenauigkeit sondern auf dem „big picture“ und dem einfachen Vergleich.

Da mich die Datenfülle des Baumkatasters so begeistert hat, habe ich noch eine etwas künstlerischere Variante umgesetzt:

Interessanterweise funktioniert der Vergleich zwischen Verbreitungsmustern von Baumgattungen in dieser im Informationsgehalt sehr reduzierten Form besser, da wir uns wegen des Wegfalls der Kontextinformationen vollends auf diese Muster konzentrieren können.

Wenn Sie die Darstellung mögen: Ein Klick auf obiges Bild liefert eine hochaufgelöste Rastergrafik (PNG-Datei), mit einem Klick hier können Sie eine PDF-Datei in guter Qualität herunterladen (Format A1).

 

Internet of Things: Live Data Quality Assessment for a Sensor Network

TL;DR: We believe that connected devices and real-time data analytics are the next big things in GIS. Here is a live dashboard for a sensor network in 7 cities around the world.

Geoinformation systems have evolved quite rapidly in recent years and the future seems to be more exciting than ever: All major IT trends such as Internet of Things (IoT), big data or real-time systems are directly related to our professional domain: Smart devices are spatially located or even moving in time; big data and real-time systems almost always need locational analytics. This is why we got interested when we heard about the „Sense Your City Art Challenge„, a competition to make sense of a network of DIY sensors, which are spread over 7 cities in 3 continents. To be honest, our interest was not so much drawn to the „art“ aspect, at the end we are engineers and feel more at home with data and technology. And there is real-time sensor data available within the challenge: About 14 sensor nodes in every city deliver approximately 5 measurements every 10 second, such as temperature, humidity or air quality. The sensor is data freely available. When we looked at the numbers, we realized that data had some surprising properties, for example the temperature within varies quite a bit within one city.

Screenshot Story Map
Screenshot of our story map for Sense Your City.

 

Our goal: Live data quality assessment for a sensor network

So, we took the challenge a bit differently and more from an engineering perspective: How to implement a real-time quality assessment system for sensor data? As an example, we took the following questions, which need re-evaluated as new sensor data comes in:

  • Are there enough sensors that provide information about the sensors?
  • How much does the sensor measurements vary within a city?
  • How do the sensor measurements compare to external data?

Our solution: A live dashboard with real-time statistics 

My colleague Patrick Giedemann and I started late last week and developed a live dashboard with real-time statistics for the sensor network of seven cities. The dashboard is implemented with a story map containing one world view and seven views on city-level. The components of the views are:

  • A heatmap showing a condensed view of the analysis for each of the cities, labeled with numbers 2 to 8. For example, we want to show the visualize number of sensor values for each city within a time frame of 30 seconds. The darker the blue bucket, the more sensor signals we got. Light buckets indicate a low number og signals in the time frame.
  • Another heatmap, which calculates coefficient of variation for each city, again with a time frame of 30 seconds.
  • A gauge showing the number of sensor signals for a city and a linechart with the minimum, maximum and average temperature for a city.

We haven’t yet got around to show real weather data, although it is processed internally.

Some implementation details

For the technically inclined: Our implementation is based on Microsoft’s Azure, one of many cloud computing platforms available. Specifically, we used three main components: Event Hubs, Stream Analytics and WebSockets.

introduction-to-azure-stream-analytics_02
Graphic from Microsoft Azure documentation. We used Event Hubs, Stream Analytics and WebSockets instead of a data store.
  • We started building our solution using Azure Event Hubs, a highly scalable publish-subscribe infrastructue. It could take in millions of events per second, so we have room to grow with only 170’000 data points per hour. Every ten seconds, we pull the raw data from the official data sources and push the resulting data stream to an Azure Event Hub.
  • For the real-time analysis, we tried Azure Stream Analytics, a fully managed stream processing solution, which can take event hubs as an input source. With Stream Analytics, you can analyze incoming data within a certain time window and immediately push the result back to another event hub. For our example, Stream Analytics aggregates the raw signal data every 3 to 4 seconds and calculates  average value, minimum value, maximum value and standard deviation for 30 seconds within a city.
  • Finally, there is a server component, which transforms the event hub into WebSockets. With WebSockets, we can establish a direct connection between the data stream and a (modern) browser client.

What’s next?

Admittedly, this is a very early version of a live quality assessment system for real-time sensor data. However, it shows the potential: We can define a set of data quality indicators like number of active sensors or the variation coefficient. These indicators can be computed as the data streams into the system. Using Azure Stream Analytics, we could incorporate tens of thousands of sensors, instead of only hundred and we’d still have the same performance without changing a line of code.

Of course, there is room for improvements:

  • Ideally the sensor would push its data directly into the Azure EventHub instead of using a polling service as intermediate.
  • Exploiting historical data, like a comparison between the live data and date from a week ago.
  • Integrating more and different data sources for the data analysis.

Do you have any question? Send me an e-mail at stephan.heuel@ebp.ch or leave a comment.

Let there be light: Data visualization with SAP Lumira

GIS and Business Intelligence (BI) are buzzwords you hear together increasingly often (see also our articles on GISconnector, which we consider a low-cost, easy-entry BI solution). Inspired by this article from iX magazine I decided to have a look at the „self-service BI“ solution SAP Lumira. Lumira is an analysis and visualization tool and SAP offers a freely downloadable version. There is also a standard edition with more features that sets you back almost 1,000 $. The workflow in the application is simple: You import some data and prepare it, create visualizations and publish them.

map

Data Source for all visualizations: World Bank (© 2010 The World Bank Group)

Prepare
In the free version you can use Excel Sheets, CSV data, copy data from the clipboard or connect to a SAP HANA One database. The full version also lets you access databases with SQL queries. For my trial I found an Excel table about global energy use provided by the World Bank (downloaded from www.visualizing.org). After loading the data you can prepare it for the visualization: Filter, calculate new values or join different datasets together.

You can also create a so-called geographic hierarchy to visualize the data on a map. A geographic hierarchy is a kind of geocoding: Geographic placenames are matched with an internal database to add location to your data records. The good thing is, that it clearly identifies records that could not be automatically matched. It then offers you the possibility to select possible matches from a list. Unfortunately, you can only choose from this non-extendable list. For some reason it did not suggest Slovakia as match for Slovak Republic which would have left changing the country name in the input data as the only remaining option. Luckily, I had also country codes in my dataset which worked much better and are obviously better practice, if you have them available.

geohierarchy

 

Visualize
Now comes the really cool part: Drag and Drop visualization. Just select one of the many available charts, drop your measures and dimensions on the X and Y axes, apply some filters on the data and your diagram is ready. This is really comfortable (and I can tell, having recently spent hours producing some rather simple graphs with R and ggplot).

Apart from classic graphs you can also create maps. These are nice for a first impression, offering zooming, panning and mouseover information. But overall the maps are pretty basic with almost no options to influence the display. The full version offers the possibility to use ArcGIS Online maps which brings a broader range of functionality.
For my trial I tested only simple visualizations. But Lumira also offers some fancier variants e.g. heatmaps, network diagrams or tagclouds.

Compose
After creating visualizations to make your point, you can aggregate them into a report or a „board“. The nice thing is that the charts remain interactive. I haven’t yet tried all possibilities but you could probably do similar things as with storymaps.

Share
The next step is to make your visualizations available for others. Unfortunately there is (at least in the free version) no option to export the graphics as PDF or image files. This would be useful in order to be able to include the graphics in reports and presentations.

One possible solution to this is to upload your board to the Lumira Cloud. That is very neat and you can then provide access to individual users or the whole world. proved to be an unexpected hassle and you can only enjoy my screenshots for the moment.

A few words about the data

energyUse
For this little test I was more interested in the tool than in the data itself. Some things look quite interesting though. Whereas the per-capita energy consumption for most countries has increased between 1970 and 2005, Luxembourg shows a massive decrease. My first hypothesis for the cause of this unexpected outcome is the demise of the heavy industry, but I have not yet found a confirmation for this. Perhaps you, dear reader, know more?

GISconnector: The beginning of a beautiful friendship between ArcGIS and Excel

As of September 2014, we at Ernst Basler + Partner distribute the application GISconnector for Excel in Switzerland. GISconnector is a software created by Germany-based GI Geolabs. In this article, we highlight the new possibilities that arise through the close interconnection between ArcGIS and Excel. Contact us if you have any questions or if you’re interested in a demonstration in a screen sharing session.

Vertragsunterzeichnung
Stephan Heuel of Ernst Basler + Partner and Matthias Abele of GI Geolabs after signing the partnership agreement.

But first things first: What does GISconnector offer to you? It combines the capabilities of ArcGIS Desktop and Microsoft Excel in a smart way. The best thing about it is that you can do all the attribute-related steps in your GIS workflow seamlessly in Excel. This way, you have the power of a full-fledged spreadsheet software at your fingertips, perfectly integrated within ArcGIS. As a result, working with attribute data is much easier and takes a lot less time. You can find a comprehensive list of GISconnector’s functionality here.

GiSConnector

Working with GISconnector usually follows this pattern: I load a feature class (Geodatabase or Shapefile, whatever floats your boat) as a layer in ArcMap. Using GISconnector I can conveniently export the layer’s attribute data to Excel and create a connection between the two programs in the same process. From this moment on, I can easily swap selections, definition queries and filters between ArcGIS and Excel. The same applies to changes in attribute values and attribute names as well as adding attributes. A toolbar lets you control GISconnector both from within ArcMap and Excel.

GISconnector-Toolbar in Excel
GISconnector toolbar in Excel

Sending selections, data et cetera from one application to the other takes just one click. The image below shows a simple example of GISconnector in action: A feature class of Swiss cantons in ArcMap and the connected attribute table in Excel. I selected some of the southern cantons (marked with turquoise borders) in ArcMap and then sent this selection to Excel as a filter. Thus, Excel only shows the three rows related to the selected map features.

ArcMap und Excel, verbunden durch den GISconnector
ArcMap and Excel linked by GISconnector

Further, the connection between the two applications lets ArcGIS users access Excel’s advanced bag of tricks, ranging from complex functions and AutoFilters to conditional formatting and dynamically updated charts. I will write about such an example in an upcoming article.

Have we sparked your interest in the GISconnector?

Learn more about GISconnector’s functionality from the makers of the software.

Watch a demo video. This one shows how to transfer selections and filters:

 

Test the product free of cost.

Contact us if you have any questions or if you’re interested in a demonstration via screen sharing.

GISconnector: Der Beginn einer wunderbaren Freundschaft zwischen ArcGIS und Excel

Seit September 2014 sind wir von Ernst Basler + Partner Schweizer Vertriebspartner für den GISconnector for Excel . Der GISconnector ist eine Software der deutschen GI Geolabs. Im folgenden Artikel präsentieren wir die neuen Möglichkeiten, die sich durch das enge Verzahnen von ArcGIS und Excel ergeben. Kontaktieren Sie uns bei Fragen oder für eine unverbindliche Demonstration per Screensharing.

Vertragsunterzeichnung
Stephan Heuel von Ernst Basler + Partner und Matthias Abele von GI Geolabs unterzeichnen den Vertrag über die Vertriebspartnerschaft

Doch zuerst: Was bietet der GISconnector? Er verbindet auf intelligente Weise die Fähigkeiten von Esri ArcGIS Desktop mit jenen von Microsoft Excel. Der Clou dabei ist: Alle Arbeitsschritte, welche die Attributdaten betreffen und welche normalerweise mit der Attributtabelle in ArcGIS ablaufen, können mit Excel erledigt werden. Damit hat man alle Möglichkeiten einer modernen Tabellenkalkulations-Software zur Verfügung, perfekt integriert in ArcGIS. Es ergibt sich eine signifikante Erleichterung der Arbeit mit Attributdaten und für die meisten von uns eine grosse Zeitersparnis. Hier findet man eine ausführliche Beschreibung der Funktionalität des GISconnectors.

GiSConnector

Das Vorgehen mit dem GISconnector for Excel sieht grundsätzlich so aus: Ich lade eine Feature Klasse als Layer in ArcMap. Geodatabase oder Shapefile spielt dabei keine Rolle. Mit dem GISconnector kann ich die Attributdaten bequem ins Excel exportieren und mit der Excel-Datei eine Verbindung aufbauen. Von diesem Zeitpunkt an kann ich Selektionen, Definition Queries (Definitionsabfragen) und Filter von ArcGIS nach Excel und von Excel nach ArcGIS übertragen. Das gleiche gilt für Änderungen an den Attributwerten, die Anpassung von Attributnamen sowie die Erstellung zusätzlicher Attribute. Gesteuert wird der GISconnector sowohl in ArcMap als auch in Excel über eine Toolbar.

GISconnector-Toolbar in Excel
GISconnector-Toolbar in Excel

Die Übertragung von Selektionen, Daten etc. von einem Programm ins andere braucht jeweils nur einen Klick. Die folgende Abbildung zeigt als einfaches Beispiel eine Feature Klasse der Kantone in ArcMap und die damit verbundene Tabelle der Attribute in Excel. Die in der Karte blau umrandeten Kantone habe ich in ArcMap ausgewählt und die Selektion danach als Filter an Excel übertragen.

ArcMap und Excel, verbunden durch den GISconnector
ArcMap und Excel, verbunden durch den GISconnector

Die Verbindung der beiden Programme eröffnet der ArcGIS-Nutzerin und dem ArcGIS-Nutzer das gesamte Potenzial der Excel-Trickkiste: Von komplexen Formeln über Autofilter und bedingte Formatierung bis zu dynamischen Diagrammen. Von einem solchen Anwendungsbeispiel berichte ich im nächsten Artikel etwas ausführlicher.

Haben wir Ihr Interesse am GISconnector geweckt?

Informieren Sie sich auf der Website des Herstellers über den gesamten Funktionsumfang.

Schauen Sie sich Demo-Videos an, zum Beispiel das folgende mit grundlegenden Funktionen:

 

Beziehen Sie eine kostenlose Testversion.

Kontaktieren Sie uns bei Fragen oder für eine unverbindliche Demonstration per Screensharing.

The Data Worker’s Manifesto

straumann-geobeer8-slide-1

Last week I gave a talk at the 8th instalment of the GeoBeer series on EBP’s Zurich-Stadelhofen premises and sponsored by EBP and Crosswind. It was titled State of the Union: Data as Enabling Tech‽

You can check out the whole slidedeck on my private website (The slides are made with impress.js and best viewed in Chrome. Please ignore my horrible inline CSS..)


straumann-geobeer8-slide-2

I’m quite sure it’s not best practice to give one’s talk an unintelligible title. Nevertheless, that’s what I did, so let me explain what the different parts mean:

I chose „state of the union“ as a fancy way of expressing that I’m directing my talk primarily at fellow geoinformation and data people.

With „data“ we usually refer to raw observations of some phenomenon. We’ll discuss later, how helpful that definition turns out to be.

„Enabling tech“ would usually expand to „technology“ and the term is used to denote a technical development that makes novel applications possible in the first point. However, in the context of this talk it may be worthwhile to keep the 2nd potential meaning of the stub „tech“ – „technique“ – in mind, as well.

Finally, the  is called an interrobang and nicely reflects the semantic ambivalence of combining ? and ! into one punctuation mark.


straumann-geobeer8-slide-3

Sometime in the last decade, we as a society have moved from a situation where data was usually scarce to one where (many forms of) data are abundant. Where before, the first step of analysis was often one of interpolation between valuable data points, we now filter, subsample, and aggregate our data. Not all domains are the same in this respect, obviously. But I think the generalisation pretty much holds, as (often ill-applied) labels such as „big data“ or „humongous data“ indicate. (Well, the latter is obviously a joke; but think about why it works as such.)

Big drivers of this development are a) the Web and its numerous branches and platforms and b) smartphones, tablets, phablets and what have you, or more broadly speaking: embedded sensors, GPS loggers, tracking and fleet management systems, automotive sensors, wearables, ’self-tracking‘ or ‚quantified-self‘ technology, networked hardware such as appliances (think Internet of Things) and the like.

In what follows I’m going to talk primarily on crowdsourced data. (In other contexts, crowdsourced (geographic) data is also called e.g. Volunteered Geographic Information, VGI, (a term fraught with problems), or User-Generated Content, UGC.) But some of the assertions also hold for data in general.


straumann-geobeer8-slide-4

straumann-geobeer8-slide-5

Crowdsourced data, i.e. data that:

– is gathered from many contributors,

– in a decentralised fashion,

– following (at best) informal rules and protocols,

– voluntarily, unknowingly or with incentives,

has some issues.

The large-scale advent of this crowdsourced data of course coincides with the development of the so-called Web 2.0 (in German also referred to as the ‚participation Web‘), where anybody could not just be a consumer, but also (at least, in theory) a producer, or: a produser. Or so we were told.

 

 


straumann-geobeer8-slide-6

But: crowdsourced data is biased

This map shows OpenStreetMap (OSM) node density normalised by inhabitants (compiled by my OII colleagues Stefano de Sabbata and Mark Graham).

Assuming (somewhat simplifying) that the presence of people effects the build-up of infrastructure, in an ideal world this map would feature a uniform colour everywhere. However, there are regions where relative data density in OSM exceeds that of other regions by 3–4 orders of magnitude! Compare this to the density of placenames in the GeoNames Gazetteer!

Clearly, offering an „open platform“ and encouraging participation is not enough to really level the playing field in user-generatation of content. In some regions people might not have the means (spare-time, economic freedom, hardware, software, education, technical skills, access to stable (broadband) Internet, motivation) to participate or they might e.g. have reservations against this kind of project or the organisations behind it.

Spatially heterogeneous density is just one example of bias we find in crowdsourced data. Another one is termed user contribution bias, where a very small proportion of contributors (think Twitter users, Flickr photographers, Facebook posters, …) creates a large proportion of the data. Depending on the platform we see very lopsided distributions with few percent of users being behind a large share of the content. In his Master’s thesis, Timo Grossenbacher found that in his sample of Twitter, 7% of the users created 50% of the tweets. Despite all techno-optimism: clearly, not everyone is a produser and clearly not all contributors create equal amounts of content!


straumann-geobeer8-slide-7

Talking of different kinds of bias: OSM has also been found sexist, for example. OSM contributors (like in many crowdsourcing initiatives) are, as a tendency, young, male, technologically minded, with above average education. Narrow groups of contributors may, inadvertently or consciously, favour their own interests in creating content.

OSM’s „bottom-up data model“ (basically, the community discusses and decides what is mapped how) gives contributors allocative power, i.e. what most people (or the most industrious contributors?) adopt as their practice has good chances to evolve into community (best?) practice.


straumann-geobeer8-slide-8

 


straumann-geobeer8-slide-9

Further, some patterns in crowdsourced data may be very surprising.

One example this talk has already touched upon is user contribution bias, where a small group dominates the crowdsourcing activity. A more complicated example of surprising insights hidden in crowdsourced data is in the figure on the left. Remember that in Wikipedia, the self-declared repository for the sum of all human knowledge it’s well known, that the spatial distribution of geocoded and „geocode-able“ articles is strongly biased. A map I made with my colleagues at the OII shows that a part of Europe features as many Wikipedia articles as the rest of the world. (By the way, there is this interesting Wikipedia page that discusses all kinds of biases that affect Wikipedia.)

Now, as the figure shows, despite this known severe lack of content e.g. in the Middle East and North Africa (MENA), only about a third of edits that are made by contributors in that region are about articles in the same region. Surprisingly, a large proportion of MENA’s (in absolute terms low) editing activity is geared towards contributing to articles outside their own region, about phenomena in North America, Asia and Europe. If you expected, as many people do, that contributors edit mostly about phenomena in their immediate environment and that they tend to „fill in gaps“ in content, this insight comes as a surprise.

Cultural, personal (education, careers, family relations, travel, tourism, …), linguistic, historical, colonial, political, and many more reasons may play into this.


straumann-geobeer8-slide-10

The new abundance of data, the proliferation of open (government) data, APIs and the current popularity of information or data visualisation (infoviz/dataviz) as well as data-driven journalism (DDJ) has led to many more people and institutions obtaining, processing, analysing, visualising and disseminating data.

While this may be welcomed by data-inclined people in general, unfortunately it sometimes leads to people attaching false meaning to data or to interpreting insights into data that are not supported by it.

This example shows geocoded tweets in response to the release of a Beyoncé album. In my opinion, while technologically interesting, the visualisation has severe flaws in terms of (re)presentation, cartography and infoviz best practices. But: even more importantly, it utterly fails to mention e.g., that a) Twitter users are a highly biased, small subgroup of the general population, that b) the proportion of geocoded tweets is estimated to be in the very low percent numbers (often, < 3% is indicated!), that c) user contribution bias is likely at play, that d) geolocation may be faulty, etc. etc.


straumann-geobeer8-slide-11

Finally, this figure shows the result of „ping[ing] all the devices on the internet“ according to John Matherly of Shodan. This figure and story went viral, it appeared e.g. on Gizmodo, The Next Web, IFLScience!, and many more.

Turns out, if you dig a bit deeper, there are some rather important disclaimers: e.g. a very limited window during which the analysis was reportedly carried out and, more importantly, only pinging devices addressed using IPv4, not considering IPv6. You can read about these on this Reddit thread.

Turns out some countries in Asia that have recently invested heavily into broadband Internet infrastructure and also large parts of Africa where the Internet is mainly used on mobile devices, use IPv6 and thus show up as black holes or rather dark regions on this „map of the Internet“.

Sadly, the relative lack of access to Internet, content and netizens in Africa is a truth (cf. the OII Wikipedia analyses mentioned above). However, the situation, at least in terms of connected devices is not as dire as this map makes you believe!

However, I think the very fact that the map played into this common narrative of unconnected, offline regions is an important factor in its massive proliferation (a.k.a. ‚going viral‘). Unfortunately, it seems all this sharing happened without discussions on the data source, data collection method, processing steps, and important disclaimers about the data’s validity and legitimacy – and, let’s face it, very little critical reception and reflection on part of the audience, i.e. us.

The effects? – The original tweet has been retweeted more than 5,500 times! Go figure.


straumann-geobeer8-slide-12

With these examples in mind, let’s turn to the classic Data-Information-Knowledge-Wisdom workflow or pyramid. In the DIKW mindset, data is composed of raw observations. Only structuring, pattern-detection, and asking the right questions turn data into information. Memorised, recalled and applied in a suitable context, information becomes knowledge. And finally, there’s the wisdom stage that is concerned with ‚why‘ rather than ‚what‘, ‚when‘, ‚where‘ and ‚how‘ etc.


straumann-geobeer8-slide-13

Well, turns out, one can argue rather well that ‚raw data‘ does not, in fact, exist.

Data – and I would argue also crowdsourced data – is usually collected with an intent, an application in mind or, if not that, at least with a specific method, from a certain group of people, by a defined group of people, using a certain measuring device. Whether this happens implicitly or explicitly and willingly does not matter in this context. Clearly, however, these factors all potentially affect the applications the data can sensibly be used for.

So, there goes the title of my talk: ‚data‘ may not actually be ‚raw‘. And overly focussing on technology and missing out on the underlying technique can be dangerous!


straumann-geobeer8-slide-14

Putting it bluntly: Unlike this car, data is never general-purpose.

 

 

 

 

 

 


straumann-geobeer8-slide-15

For all these reasons, and because I care about our profession and about what is being done with data in the society at large (think: data-driven churnalism journalism, evidence-based politics, etc.) I would like to propose:

The Data Worker’s Manifesto.

It consists of only few, easily memorised principles:

 


straumann-geobeer8-slide-16

Know your data!

Know the sources of your data, collection methodology, the sample size and composition, consistency, pre-processing steps possibly carried out by others or by yourself, more generally: the lineage, biases, quality issues, limitations, legitimate appliations and use cases. Know all these very well. If you don’t, try to find out. If you can’t be sure, refrain from using the data.


straumann-geobeer8-slide-17

Discuss data and how it’s being used.

The Internet and social media are wonderful things where thousands of links are shared. Ever so often you may see an analysis with un(der)-documented input data or methodology.

Reflect critically what others may share blindly. If you have questions: remember, the Web is a two-way street these days. Gently but firmly ask them and make your sharing of, and investment into, any analysis dependent on the answer.


straumann-geobeer8-slide-18

Create and share metadata!

If you do data-based analyses and produce visualisations, always keep track of what you have done with the data: Did you apply filters? Remove (suspected) outliers? Subsample, downsample, disaggregate, aggregate, combine, split, join, clean, purge, merge, … the data? Document your steps and assumptions and share this metadata to give your collaborators and your audience insight into data provenance and your methodology, along with the results.

If you share your insights in a social media content (e.g. a map as a PNG file), I recommend burning the metadata into the result, i.e. put the metadata somewhere into the content so that it’s hard to remove. Because said content will – at some point – be taken, proliferated, received and analysed out of context. Guaranteed.


straumann-geobeer8-slide-19

3b is very similar to 3: Create and share metadata!

Seriously: I know metadata is uncool and not sexy at all to maintain. But nothing good comes from not doing it!

 

 

 


straumann-geobeer8-slide-20

Experts are valuable.

While the „end of theory“ has been proclaimed, I think the „report of [its] death has been greatly exaggerated“.

Being, or being in contact with, a domain specialist is still very valuable. Sometimes, especially for harder, i.e. more interesting, analyses, it’s indispensible. In the very least, expert knowledge may save you from doing something silly with data you don’t completely understand.


straumann-geobeer8-slide-21

We’re in this together.

I feel we are all still coming to terms with the new opportunities the Web and some of the data-related developments I mentioned provide to us (let alone methodological and computational improvements and societal developments). It can be a bumpy, but in any case an exciting, ride, so let’s buckle up, meet and talk and share our experiences – but that’s obviously why all of you have come to this GeoBeer in the first place!


straumann-geobeer8-slide-22

straumann-geobeer8-slide-23

I feel that despite all these potential pitfalls we should perceive the abundant data, especially new data types such as crowdsourced and open government data, as huge opportunities!

I’m convinced that, with the right people and the right mindset, we can do great things, privately or politically, that have the potential to improve our respective environments ever so slightly.

I feel that Switzerland as a democratic and affluent country provides us with an especially friendly environment to get involved, in business, in research, and in societal goals.

Thank you all for your attention!