Fitting the Alps into an App

Recently, we faced a peculiar dilemma: We wanted to provide a mobile app prototype with an attractive basemap for overview purposes. Our prototype app is focused on Switzerland and we wanted (almost needed, really) to incorporate elevation information in the form of at least a shaded relief. For the intent of this blog post it suffices to say that we wanted to show some real-time position on top of the basemap. The app was geared towards an international audience that doesn’t necessarily have a mobile data plan nor WiFi access at the time when they want to use the app. What gives?

Since our app audience wouldn’t have the opportunity to download basemap data on the go, we thought further: But a) we deemed requiring an additional large (WiFi) download before the first usage as a very unattractive option. On the other hand, we b) didn’t want to bloat the app download by packaging a huge amount of basemap data with it either. A shaded relief at a sensible, non-pixelated resolution wouldn’t come cheap in terms of payload though.

Vector tiles

Thankfully, the Esri ArcGIS Runtime SDK for Android  and ArcGIS Runtime SDK for iOS (Quartz Releases) offered a way out of our dilemma: since August 2015 Beta versions of the SDKs offer the capability of rendering vector base maps. The first production releases of these Quartz SDKs will be released in November 2016, Quartz Runtime SDKs for other platforms will follow soon (Esri ArcGIS Runtime SDKs). In a stroke of insight we came up with the plan to simply incorporate a compact vector version of a shaded relief map.

<audible gasp>

Actually, it’s not as crazy as it sounds: we were pretty sure we could discretize a shaded relief into a handful of classes, generalize it thoroughly (really thoroughly!) and arrive at something attractive and functional and smaller than tiles of a raster shaded relief.

Slimming down the data

The actual process involved an EBP-owned digital elevation model (as the official one is not yet open data, unfortunately, ahem). We computed a shaded relief and after several tries arrived at a promising discretization into merely 5 classes. Additionally, we computed and vectorized 3 elevation intervals, mainly to give the Swiss lowlands some additional elevation information. The workflow involved a mixture of tools: mainly ArcGIS for Desktop, choice functions from ET GeoWizard as well as a hint of FME. The final vector shaded relief comprises a total of 13,714 individual features after rigorous generalization of both spurious features and vertices (we had started out at 27,237).

Styling options

See the results for yourself. A coloured version of the basemap:

And with a set of freely available geodata overlaid (click for larger image):

This is how this version looks on a mobile device (click for large image; iPad Air template file CC-BY Netspy):

Besides these more colorful versions, you can generate a neutral basemap in gray shades to give more attention to the data displayed on it. A vector tiles package containing the basemap below is around 14 MB and thus suitable to be packaged together with the initial app download. In our conservative estimate, an identical image tiles basemap would – at least – multiply this value by 10.

Vectorized relief map: neutral colors for more attention on project data added on top
Vectorized relief map: neutral colors allow for more attention to the data we will put on it in the app

Good usability makes happy users

Not a perfect solution, but it serves our purposes for the prototype app really well and makes use of the latest technologies available from Esri for a light app payload.
Furthermore, vector data looks good at all zoom levels, not only at certain levels which the tiles are generated for, as you experience it with image tiles. As a bonus, the zoom and pan interactions are much smoother with vector tiles compared to image tiles.

With a bit more time on our hands we could certainly refine the process further and iron out remaining kinks. Let us know if you face similar challenges around app development, data munging or user interfaces and would welcome some innovative thinking from our team of analysts and developers. Get in touch!

Pedestrian Isochrone Maps

On Monday, October 3rd, the 17th Annual Conference on Walking and Liveable Communities, Walk21 Hongkong has opened its doors to more than 500 participants. One of the speakers will be our Ivo Leiss. In his presentation, he will speak about Walkalytics – EBP’s approach to data analytics for business questions related to pedestrians.

Walking has always been a topic on our agenda. Already in 2013, we have written about accurate analytics for pedestrian accessibility and quality of service for public transport. Since then, we have extended and refined our methodology for pedestrian mobility analysis and successfully applied it to our customers‘ business and location intelligence tasks.

The Walkalytics method

At the heart of our approach are isochrones. Isochrone maps for different modes of locomotion are the hot new thing and there are a lot of interesting blog posts and offerings available, for example on Google Maps Mania or on Medium.

In contrast to the abundant graph-based methods, we take a different path (no pun intended): Our pedestrian isochrones show the precise walking time of a neighborhood for any starting point. Rather than following  a network of streets and paths, they are an aggregate of thousands of individual paths, bundled into one result. As opposed to other isochrone analyses, our approach takes into account desire paths and potential shortcuts across open spaces such as large squares. And it takes less than a second to compute! But a picture is worth a thousand words, and an animated picture is priceless:

A pedestrian isochrone in the city of Bern, calculated with Walkalytics. The caluclation is based on OSM data.
Pedestrian isochrones for a location in the city of Bern, calculated using Walkalytics.

The animation demonstrates our area-based approach: Starting at a particular point, thousands of virtual pedestrians start walking in every possible direction. Every few metres, they ‚measure‘ their walking time and continue walking. Their walking speed depends on the walkability of the ground they are covering: It’s faster to walk on a nice path than on rough terrain; it’s forbidden to walk on a highway or across a railroad and impossible to walk across water. Additionally, we take into account the topography: walking uphill and downhill is associated with different costs depending on the slope. Using the Walkalytics approach, it is also possible to model walking times based on custom rules for the the underlying surfaces and topography.

Your advantages

What are some of the advantages of our approach to computing isochrones for your business or agency?

  • Very detailed results: With one computation, we can show the area that is accessible from any given point within any given timespan, not only for few discrete time steps.
  • We don’t need routing-capable data, we just model every patch of your neighborhood based on its walkability.
  • We can easily combine multiple data sources to model the walkability, like national mapping data, cadastral or surveying data, municipal data, and e.g. OpenStreetMap. Combinating data sources for best coverage is easy. This flexibility of adopting to, and using, different data sources has proven tremendously helpful in recent projects.
  • It’s fast, especially considering the information value of the result: Computing one isochrone at 5 meters resolution with an upper limit of 20 minutes of walking, we analyse literally thousands of individual paths and get hundreds of thousands of walking time measurements as a result. And all this information still can be computed in much less than a second on an ordinary laptop.

Isochrones are certainly interesting! But what is their value for authorities and businesses? What are their use cases? In future blog posts, we will discuss some interesting applications. Meanwhile, you can visit the Walkalytics website, test-run our API or simply play around and create your own animated isochrone by clicking in the map below (computation of these may take up to around 20 seconds, because creating animated GIFs takes much more time than computing the isochrone):

Offline Editing with ArcGIS

We are currently working on a mobile app with offline geodata editing capability. The idea, of course, is: Users collect and edit data in the field and synchronise changes back into a central database when they return to their offices. The ArcGIS platform provides all the tools to easily implement that functionality. The necessary configuration is described in this tutorial. However, if you use your own app instead of Esri’s Collector app, you have to consider a few additional points. You could find out about these on help sites. But to keep you from searching too long I’ll share them in this post.

Offline geodata collection and editing with ArcGIS can facilitate railroad track maintenance
Offline geodata collection and editing with ArcGIS can facilitate railroad track maintenance

Challenges … – and a solution

We basically had two challenges to solve:

  • How do we bring the edits that were made outdoors back into the DEFAULT version of the database?
  • How do we get rid of all the (in this context: superfluous) versions the offline sync creates?

A look behind the scenes is helpful to understand what’s going on: For offline synchronisation in the Esri ecosystem you first need an enterprise geodatabase. The feature classes can either be versioned, or non-versioned with archiving activated. In our multi-user environment the second option was not viable, thus we had to go with versioning. You can find more information about setting up your environment here.

When a user then downloads data from a feature service, the following happens: The database creates a new version that is derived from the DEFAULT version. This new version gets a name like username_ID (for example Esri_Anonymous_i_1466000080844 if your map service is not secured). ArcGIS then creates a replica based on this version. This is a file geodatabase that is stored on the mobile device. One more important detail: In the database the replica is a child version of the originating version. This „hidden version“ is invisible with the normal ArcGIS tools. You can only see it in the SDE.VERSIONS table where it appears with a name like „SYNC_“.

After offline editing the user starts the synchronization process. However, this does not yet bring your edits back to the DEFAULT version of the database. Synchronize only reconciles the data between the replica and the originating version. Afterwards you need to use the Reconcile Versions tool to finally see your edits in the DEFAULT version and hence your map service. In order to streamline this process, for our application we created a geoprocessing service based on the Reconcile Versions tool which the mobile app calls after synchronization is complete (see also this help page).

Addendum for maintaining performance

The above process works fine. But there is, as you may know, one flaw: The database versions don’t automatically disappear but keep piling up. This can become a problem, when the version tree is big – which makes database compression inefficient. In the end (certainly in our project with several hundred versions), you can end up with an incredibly slow database. So let’s get rid of those unnecessary versions as soon as possible!

The Reconcile Versions tool has an option to delete version afterwards. Unfortunately, the tool fails consistently with the error message „Error deleting version […] Operation not allowed on a version with dependent children“. But there are no child versions visible in ArcGIS. So what gives? Of course, the problem arises from the „hidden versions“ of the replicas that keep us from deleting their parent versions. And how do we eliminate those? First of all, by unregistering the replicas using the REST interface of ArcGIS Server. But in our case it turned out that this was not enough: Somehow we have some „SYNC_“ versions that do not have a replica registered on ArcGIS server. Where these come from I am not entirely sure. Maybe they are created when a user aborts the data download from the map service? In any case, you can remove those versions using the Delete Version tool – although they are not visible in ArcGIS Desktop. You just need to find the version names using either arcpy.da.ListVersions or a query on SDE.VERSIONS.

The overall workflow in the end looks like this:

While the ArcGIS platform provides a great ecosystem, there are some murky corners where things can become convoluted – as in every GIS ecosystem, let’s face it. I hope these tips help you better understand offline editing and synchronization. If not or if you have an even trickier problem to solve, my colleagues and I are happy to take your questions, or hear your experiences.

Patrouilles des Glaciers: Track Replay

Comparing Positions with Time Lag

As blogged earlier I spent a week at the race office of the Patrouille des Glaciers (PdG). During the race, each PdG team of three athletes was equipped with a GPS device that kept broadcasting its position to the race office. Furthermore this device was equipped with an emergency button able to send a distress message along with its current position to the race office.

We wanted to know whether this security feature could be used for replaying the tracks of the individual teams. Replaying race tracks becomes interesting when you can compare teams. A team might ask: Where did we lose or win with regard to our peers? The start of the PdG, however, was staggered, i.e., there are several starts with a time lag between the starts of half an hour to an hour. Thus, comparing teams that competed in different time slots necessitates that we include a time shift compensation.

Data Quality

Initially, we analyzed the data from the first race (Tuesday/Wednesday). The teams follow a well signaled track (red line in figure below). As you can see the team positions reveal some astonishing errors. We suppose these positional errors are due to limited „visibility“ of GPS satellites in the mountaineous terrain as well as multipath GPS signals that introduce an error in time-of-flight measurement.

Positional accuracy on the leg Zermatt – Tête Blanche (positions of 10 teams in different colors, following the red line)

The GPS device is supposed to send its position every 2 minutes. However, this was clearly not always the case: In the example below we observed a gap of 40 minutes between positions – visible in below map where the points are located far apart.

Lacking data on the leg Pas du Chat – Verbier (positions of one team)

Data Preparation for Track Replay

For displaying race tracks, we decided to show only points that are within a buffer of 1,000 meters rather than projecting the points onto the track from a too great distance and hence pretending a better accuracy.

GPS cannot measure elevation as precisely as position. Therefore we substituted the GPS elevations by elevations from a terrain model.

We did not interpolate intermediate points where there are time gaps. Therefore, not all the teams will move smoothly.

Track Replay Application

At EBP we then developed a proof-of-concept application Track Replay for replaying the 2016 PdG tracks. In real-time mode you can replay the Tuesday/Wednesday race the way it took place, i.e. including offset race starts for different teams. In compare mode all teams start at the same time in Zermatt and Arolla, respectively. In this mode, you can compare selected teams by ticking their respective check boxes on the left. Putting the cursor on a dot on the elevation profile identifies the team and highlights its position on the map.

The team list on the left is ordered by (current) rank. In theory, at the end of the replay the list order should correspond to the official ranking list Z1 and A1 of race result. However, this is not quite the case because our ranking is based on the distance on the track at a given time and the distance is derived from the GPS position projected onto the track. Since the quality of these positions is often questionable, the projected positions are also affected.

PoC TrackReplay
Track Replay proof of concept by EBP.

Thus, our proof of concept shows the idea of a track replay supporting a comparative mode. However, the capture of the positions with the tracking devices used in PdG 2016 is not yet quite suitable for this application. The great news is: An exciting and promising technology by race result that combines timing and tracking using active transponders will be available soon!

Please note that Track Replay is in the prototype phase. For best results (e.g., in order to display the dots on the elevation profile) we recommend to use the Firefox web browser. Track Replay will be online for a couple of days only. For more details concerning the Patrouille des Glaciers please have a look at the official PdG web site.

Are you interested in getting to know more? Feel free to contact me.

Die nächste Evolution von GIS

… so hiess mein Artikel und Vortrag für den Track Innovation und Trends am GEOSummit 2016. Worum ging’s? Die Geodatenangebote der Kantone und des Bundes stehen, Services und zum Teil Datendownloads sind bereit und Behörden wie auch Private nutzen GIS auf dem Desktop, online und mobil on-the-go in raumrelevanten Fragen. In meinem Beitrag wollte ich aber mal ganz bewusst über das „Tagesgeschäft“ hinaus blicken und einige Veränderungen einfangen, die wir wegen ihrer Subtilität und vor lauter Routine oft nicht recht wahrnehmen.

Dabei habe ich mich zu einem guten Teil auf „weiche“ Faktoren konzentriert wie zum Beispiel Veränderungen am Umfeld, in dem GIS genutzt wird. Natürlich laufen nebenbei alle bekannten technologischen Umwälzungen: Drohnen, Augmented und Virtual Reality, Cloud Computing, Wearables, Nearables, autonome Systeme und Bots, Sensor Networks und Smart Infrastructure, etc. etc. Manche von diesen kommen am Rande auch vor in meinem Beitrag (und wir können uns gerne hier oder andernorts mal über die technologische Seite austauschen); die technischen Aspekte stehen bei meinen Betrachtungen aber nicht im Zentrum.

Die Folien meines Vortrags können Sie hier anschauen:

Und bei Interesse finden Sie hier den Volltext meines GEOSummit-Abstracts:

In vielen Bereichen unseres Lebens nutzen wir komplexe Infrastrukturen und Dienstleistungen. Beispielsweise bringt uns fünf Minuten nach Ankunft des Zugs ein Bus an unsere Destination. Wir sind mit Wasser, Strom, Gas oder Fernwärme versorgt. Abwasser und Abfall werden zuverlässig weggeführt. Die Regale in den Geschäften sind stets gefüllt und das nötige Ersatzteil wird zuverlässig in die Garage geliefert.

Basis für dieses gute Funktionieren unserer Infrastruktur – und unseres gesellschaftlichen, wirtschaftlichen und politischen Lebens – sind die sorgfältige Planung, Steuerung, und Pflege der involvierten Anlagen und Prozesse. Dafür sind Informationen unabdingbare Grundlage. So wie im letzten Jahrhundert die Entdeckung und Nutzung des Erdöls die Industriegesellschaft befeuert hat, sind Informationen wichtigster Grundstoff unserer Wissensgesellschaft.

Erzeugung und Verwendung von Informationen sind Veränderungen unterworfen, welche auch Auswirkungen auf die Geoinformationsbranche haben. In seinen Überlegungen zur Wissensgesellschaft identifiziert das Bundesamt für Kommunikation vier Haupttrends: Mobile, Social, Cloud und Information (Abb. 1).

Abb. 1: Die Haupttrends „Mobile“, „Social“, „Cloud“ und – im Zentrum – „Information“ sowie die involvierten Akteure (eigene Darstellung)

Von diesen Trends ausgehend: Was kommt auf uns zu?

Verändertes Umfeld

In der Wissensgesellschaft nimmt die Informationsnutzung in Verwaltung und Politik aber auch in der Zivilgesellschaft weiter zu. Hinter letzter stehen zum Teil neue Gruppen von Nutzenden von Geoinformation, welche sich im Zug der aufgezeigten Entwicklungen formiert haben: schon seit einiger Zeit finden Geodaten unter anderem im Datenjournalismus (data-driven journalism) immer häufiger Verwendung. Daneben hat die Open-Data-Bewegung neue Nutzende geschaffen, welche oft nicht den typischen Disziplinen entstammen. Nicht zu unterschätzen ist ferner die Breitenwirkung der BGDI mit der map.geo.admin-API und den teilweise geöffneten Datenbeständen des Bundes.

Die Bedürfnisse an unsere Branche entwickeln sich dadurch weiter: zum Beispiel umfassende und allgemein verständliche Dokumentation von Daten, schnelle Kommunikation auf Augenhöhe, einfache Nutzung (oder zumindest Sichtung) von Geoinformationen in Portalen aber auch die Bereitstellung offener Services, APIs und Daten (wo möglich in Echtzeit). Dadurch, dass bisher eher unterrepräsentierte Akteure auftreten, werden etablierte, aber vielleicht auch überholte Praktiken vermehrt in Frage gestellt werden. Für die Anbieter von Geoinformationen eröffnet sich die Chance, den Elan dieser neuen Nutzergruppen z.B. in die Produktentwicklung oder Qualitätsverbesserungen einfliessen zu lassen.

Consumerization und Mainstreaming

GIS wird vermehrt zu einer allgemein eingesetzten Technologie bzw. Methode werden: „GIS as a utility“. Dies ist bereits sichtbar in der fortschreitenden (leichten) GIS-Befähigung von Office-Software. Für einfache Aufgaben wie das Abbilden von Filialen auf einer Karte oder die Geocodierung eines Kundenstamms wird in Zukunft nicht mehr auf GIS-Fachleute zurückgegriffen werden müssen. Dies ist die Reifung von GIS: Der Begriff „GIS“ verschmilzt zum Teil mit anderen Themen und Disziplinen. Und: nicht überall wo GIS drin ist, steht „GIS“ drauf.

Die oben aufgezeigten Trends befähigen eine grosse Gruppe von Personen Daten – oft: Geodaten – selbst zu erheben, aus verschiedenen Quellen zu nutzen und zusammenzuziehen, aufzubereiten und weiterzuverbreiten. Dazu trägt auch die Verfügbarkeit von freier Software bei. Wie weit die Consumerization gehen wird, ist noch schwer abzuschätzen.

Neue Komplexität: IoT und smarte Systeme

Allerdings bringen technologische Impulse wie das Internet of Things (IoT) und smarte Infrastruktur, das partizipative Internet aber auch Trends wie Quantified Self sowie Virtual und Augmented Reality neue Komplexität mit sich: die bereits heute unübersichtliche Datenmenge wird sich noch weiter vergrössern. Datenströme werden wichtiger werden als Datensätze. Unternehmen und Behörden (z.B. Smart Cities) müssen durch Filtern und in Kombination von Datenströmen die richtigen Erkenntnisse gewinnen.

Dies bringt neue Herausforderungen in der Verarbeitung und Analyse von Daten, aber eben auch in der Entwicklung von künftigen Geschäftsmodellen. Hier werden Geoinformationsfachleute immer noch gefragt sein, sich aber auch zum Beispiel mit ‚Spatial Data Scientists‘ messen – oder sich zu solchen entwickeln.

Rückblick auf den GEOSummit 2016

Wir von EBP Informatik waren an der wichtigsten Schweizer GIS-Konferenz, dem GEOSummit 2016, zu Gast. Wir haben als Esri-Partner unsere Lösungen, Projekte und neusten Entwicklungen rund ums Thema Fussgängermobilität interessierten Besucherinnen und Besuchern präsentiert. Zudem haben Stephan Heuel und ich je einen Vortrag in der Session Innovation und Trends I gehalten. Hier möchten wir auf den GEOSummit 2016 zurückblicken, in Form einer reich bebilderten Twitter-basierten Review. Hier können Sie die Story auch in einem eigenen Fenster durchscrollen. Viel Spass!

Time Keeping at the Patrouille des Glaciers – A Look behind the Scenes

The Patrouille des Glaciers (PdG) is an international ski mountaineering race organised by the Swiss Armed Forces in which military and civilian teams compete. It is said to be the world’s toughest team competition. The very long race distance, the extreme route profile, the high altitude and the difficult alpine terrain with glaciers and couloir climbs are the main features of this unique competition.

As announced in November 2015 Ernst Basler + Partner is teaming with race result Swiss for time keeping this remarkable event. Let me give you a brief impression of what was going on behind the scenes regarding time keeping under the guidance of Hanno Maier, race result Swiss.

Sunday April 17 2016

The start lists are published. And all preparations for the race are completed:

  • The time keeping hardware for the teams (personalized start numbers for chest, thigh and helmet and active transponders for more than 5’000 competitors) is configured, packed and ready to be used.
  • The active decoding systems from race result are checked. The timepieces and the corresponding supports are packed in the race result van.
  • The very warm and ultra-thin outfits for the time keepers are branded with race result.
Our team: The time keepers and the three staff members of the race office.

Monday April 18 2016

The time keepers arrive at the race office at the casern in Sion. We distribute the decoding systems and the outfits. They receive their last instructions. They pack their mountaineering and climbing equipment together with the time keeping hardware heading off to the air base. They are supposed to be flown to their posts. However, the weather is not good enough for flying – waiting begins.

The time keepers get ready with their equipment.

In the meantime there are many mutations in the start list to be made, e.g., shifts in start time, replacements in teams. This job kept us busy until short before the start. (No problem for race result software!)

Back stage time keeping in the race office

Tuesday and Wednesday April 19/20 2016 – a looong double day

The weather cleared up. The time keepers and their equipment are flown into the Valais Alps. For us at the race office the crucial phase begins. Do we get signals from all the fourteen decoding systems? Great relief – the first station is online. We monitor its status and have the detection tested. At 12 AM half of the stations are operational. Two of them needed some extra care because of low transmission power. (Fortunately we could get a helicopter flight in time for flying in additional hardware!)  At 5 PM the time keeping network is complete and operational, milestone achieved.

Now we are waiting for the first start which is scheduled for Tuesday 10 PM: race result at the start of the 2016 PdG. Finally, 332 patrouilles crossed the starting line in Zermatt and another 389 in Arolla in several lots until 6 AM the next day.

Everything goes well. The first patrouilles reach Schönbiel, our first time post. The monitoring of the patrouilles goes on all night. So far so good.

The rankings are available live. At Wednesday 08:22:25 AM the first patrouille from Zermatt crosses the finish line in Verbier. At 1 PM we communicate the winners to the race committee. Around 4 PM the last patrouille (that made it to the finishing line) arrives in Verbier. We publish the final ranking list immediately afterwards top up-to-date. The interest in the results is quite remarkable: The page of the rankings has already 600’000 hits – and the race did just end.

Now, we are tired but very happy that the time keeping went perfectly well, without any noteworthy incidents. The race officer in charge congratulates us – everybody is happy! We mastered a technical, logistical and communicational challenge – the time keeping at the PdG. A big thank you to the team on the time posts and in the race office!

The second race is scheduled for Thursday April 22. The results will be available live.

Are you interested in getting to know more? Feel free to contact me.

LoRaWAN: IoT Network for the Future?

If you follow someone from #TeamEBP on Twitter, you may have noticed that last week we installed a LoRaWAN gateway of The Things Network in our office building. And like some of my colleagues you may have wondered (or wonder now): What is this all about?

Is EBP now into selling parrots (of course we could call our parrot Polly, not Lora)? Or are we supporting an alternative Zurich radio station? Good guesses. But it is of course neither of those two: LoRaWAN stands for Long Range Wide Area Network, a technology for low power wireless telecommunication networks. LoRaWAN gateways are intended to be used by battery operated sensors and other low power devices, nowadays better known as the Internet of Things (IoT), to transfer their data to the internet.

While mobile and WiFi networks drain your mobile phone battery quickly with increasing data transfer rates, LoRa takes the opposite approach. Only very little data can be sent over the network to minimize power consumption. Take for example the optimizing of garbage collection by installing sensors on waste bins, a solution that is already more widespread than I expected. You would certainly use batteries, maybe combined with energy harvesting, rather than connect every garbage container throughout a city to the power grid.


Have you ever noticed the amazing anticipation of IoT ideas in „Frau Holle„? Bread calling out: „Oh, take me out. Take me out, or I’ll burn. I’ve been thoroughly baked for a long time.“ (Image source: Public domain).

LoRaWAN Gateways serve as transparent bridges for the end-to-end encrypted communication between sensors and devices out in the field and central network servers (you can read more about the technology here). One big advantage of LoRa is that you only need a few of these gateways to cover a whole city.

While commercial companies are working on LoRa networks (e.g. Swisscom or Digimondo), the afore-mentioned The Things Network (that now EBP is a part of) is an interesting open initiative. With The Things Network, an enthusiastic community is building LoRa networks in cities all around the world. These networks are free and open to use for everybody. At EBP, we immediately felt favourably towards that idea and are excited to share some of our company’s bandwidth with the community behind The Things Network.

The Things Network Zurich coverage map with the EBP gateway
The Things Network Zurich coverage map with the EBP gateway

As an additional benefit, we thus expand our playground to experiment with IoT and new networking technologies. Our order for additional hardware to build some LoRa test devices is out and we are looking forward to do some soldering. So stay tuned for more LoRa news here. Or indeed, join the revolution yourself!

R: Auch etwas für Sie?

R bei EBP

CC-BY-SA The R Foundation
CC-BY-SA The R Foundation

In diesem Blog haben wir schon verschiedentlich (teilweise) mit R erarbeitete Analysen und Visualisierungen gezeigt: etwa meine dreiteilige Serie über die Analyse von Velozähldaten mit R und Bence Tasnádys und Nadine Riesers unterhaltsamer dreiteiliger Bericht über die Eulertour mit dem Tram durch Zürich.

Bei EBP setzen wir R sehr vielfältig ein:

  • für die Bereinigung und Umformung von Daten,
  • für deskriptive und inferentielle Analysen und
  • für agentenbasierte Modellierung beispielsweise im Bereich von Energiepreisen und noch für einiges mehr.

Vor einigen Wochen habe ich R zum Beispiel genutzt, um Gemeinden basierend auf circa einem dutzend Attributen zu clustern. Mit dem berechneten Ähnlichkeitsmass zwischen Gemeinden konnte dann auf einfache Weise eine Vorschlagsfunktion ähnlich wie bei Amazon gebaut werden. Also in der Art: „Sie interessieren sich für Gossau. Möchten Sie vielleicht Gossau mit Flawil, Uzwil, Wil, Herisau oder Rorschach vergleichen?“

Wofür R?

Wieso finde ich also R interessant und wieso nutze ich neben Python, SQL, ETL-Tools u.a. eben auch die Programmiersprache und die Software R? Hier ist meine Liste von Punkten. Für andere Leute können natürlich andere Vor- oder Nachteile ausschlaggebend sein (basically: YMMV):

  • Ähnlich wie Python verfügt R mit dem Comprehensive R Archive Network (CRAN) über eine sehr grosse Menge von Libraries, welche diverse Funktionen abdecken, die in „Base R“ nicht oder nicht in dieser Güte abgedeckt sind. Zum Beispiel: Webscraping, Netzwerkmodellierung, explorative Datenanalyse, statische und interaktive Visualisierung, Verarbeitung von Geodaten, Datentransformationen etc. Was ich bei R manchmal als Nachteil empfinde (gerade gegenüber Python): es gibt nicht immer einen offensichtlich(st)en Weg, etwas zu tun. Die Fülle von Libraries ist eine Ursache hiervon.
  • R kann diverse Datenformate lesen (und viele auch schreiben), auch Geodaten. Der Zugriff auf diverse Datenbanken, NetCDF-Files, tabellarische Daten (Excel, CSV, TSV, etc.), XML-Dateien oder JSON-Dateien ist ohne weiteres möglich.
  • Datentransformationen sind eine Stärke von R: Ob Sie Daten umklassieren, säubern, Werte ersetzen, filtern, subsetten, bestichproben, gruppieren, aggregieren oder transponieren wollen – mit den mächtigen Datentransformationsfunktionen von zum Beispiel dplyr oder auch Base R ist fast alles möglich.
  • einfache Berechnung beschreibender (deskriptiver) Statistiken wie Mittelwert, Median, Standardabweichung, Schiefe einer Verteilung, und vieles mehr, auch auf facettierten Daten
  • Machine Learning-Techniken wie Regressionsanalyse, Klassifikationsanalysen, Clustering, multi-dimensional scaling (MDS-Analyse), u.v.m.
  • diverse Möglichkeiten, aus Daten gängige Visualisierungen abzuleiten wie zum Beispiel Balkendiagramme, Liniendiagramme, Scatterplots, zum Beispiel mit der vermutlich beliebtesten Library für Visualisierungen, ggplot2. Aber auch Karten, zum Beispiel mit ggmap, und interaktive Visualisierungen, mit ggvis und shiny.
  • Mit R kann man aber auch spezialisiertere Visualisierungen erstellen wie Starplots/Spiderplots, Boxplots, Violin Plots, Small Multiples oder Heatmaps.

Wieso R?

Wichtiger noch als diese Funktionen sind aus meiner Sicht aber Vorteile auf einer übergeordneten Ebene. Gerade für Datenaufbereitung, Datenanalyse und Datenvisualisierung geniesst R meiner Meinung nach einen gewichtigen Vorteil gegenüber anderen sehr viel häufiger genutzten Werkzeugen wie Tabellenkalkulationssoftware (Excel, Libre Office, etc.): In R sind alle Verarbeitungsschritte – vom Laden der Daten über allfällige Joins, Transformationen und Aggregationen, Pivot-Tabellen, Umklassierungen, Filterungen, Analyseschritte etc. bis hin zur Erstellung von Grafiken – geskriptet (in der Sprache R).

Die Vorteile dieser Vorgehensweise verglichen mit dem Arbeiten in Excel (auf die Art, wie die meisten Leute mit Excel arbeiten) sind:

  • Transparenz: Ich kann alle Verarbeitungssschritte, welche zu einem Resultat geführt haben, in Form eines Skripts abspeichern. Ich und andere können auch sehr viel später zum Beispiel noch nachlesen, welche Transformationen auf die Daten angewendet worden sind. Zusätzlich zum Quellcode kann ich die Transparenz mit erläuternden Kommentaren unterstützen. Ich kann auch eine Versionskontrolle etwa mit GitHub durchführen – da das Skript eine Textdatei ist.
  • Reduzierte Fehleranfälligkeit: Da Verarbeitungsschritte geskriptet sind und in der Regel nicht von Tastatur- oder Mauseingaben zur „Laufzeit“ der Analyse abhängig sind, reduziert sich meiner Meinung nach im Allgemeinen die Fehleranfälligkeit. Natürlich können sich auch in einem Skript noch Fehler einschleichen, aber zum Beispiel die doch ab und zu beobachteten (und von Excel gut versteckten) fehlerhaften Bezüge in umfangreichen Excel-Dateien gibt es in R zum Beispiel nicht. (Falsche Bezüge in Excel können einen ja bekanntermassen bei wirtschaftlich sehr wichtigen Entscheiden aufs Glatteis führen.)
  • Reproduzierbarkeit: Haben sich Ihre Daten seit dem letzten Anfassen inhaltlich geändert? Kein Problem, ich kann einfach mein R-Skript mit den zusätzlichen, aktualisierten oder korrigierten Daten nochmals laufen lassen und R macht dieselben Aufbereitungs- und Analyseschritte nochmals und spuckt im Hintergrund zwei Dutzend oder auch hunderte aktualisierter Grafiken aus, während ich mich anderen Problemen widme oder einen Tee trinke. Nicht zu vergleichen mit dem Aufwand, der wahrscheinlich nötig gewesen wäre, wäre der ganze Workflow nicht geskriptet umgesetzt gewesen. Wenn ich Grafiken nochmals neu produziere, laufe ich mit R auch nicht wie zum Beispiel bei Excel und Co. Gefahr, einen wichtigen manuellen Arbeitsschritt zu vergessen oder falsch auszuführen. Ich muss auch nicht alle Excel-Grafiken für die Weiterverwendung dann nochmals wieder in Rastergrafiken umwandeln.

Zuguterletzt: Mit der R-Bridge rücken R und ArcGIS künftig viel näher zusammen. Beispielsweise können in R Daten in einer File Geodatabase gelesen und analysiert werden. Auch im Microsoft-Ökosystem wird R künftig eine stärkere Rolle spielen, beispielsweise können im cloudbasierten Microsoft Azure Machine Learning (ML) Analysen in R geschrieben werden.

Hat dieser Artikel Ihr Interesse an R geweckt? Ist R das richtige Tool für Ihre Organisation? Möchten Sie gerne eine vertiefte Einführung erhalten? Wie kann R mit Ihren bestehenden Tools oder mit Ihren Python-Skripts kombiniert werden? Kontaktieren Sie mich unverbindlich.

Forget Apps – Bots Are the Future of Geo

Update 2016-12-01: We have a second version of our bot and a landing page at It now supports Facebook Messenger, Skype und Slack.

The title of this blog post may seem hyperbolic, but during the last few months there has been an increasing
buzz on personal assistants and bots
like Amazon’s Echo, Apple’s Siri or Facebook’s „M“ – the latter being a shopping assistant in Facebook’s Messenger app. Some people proclaim that we experience a new generation of user interfaces: The conversational UX where users interact with a system by „simply saying“ what they want. Last week, [Microsoft introduced developer frameworks for conversational bots](Microsoft introduced conversational frameworks) as a core part of their cloud strategy. And just yesterday, Facebook announced their Bot API for their Messenger.

When new concepts and trends in technology occur, it is a good idea to get first-hand practical experience before adding to that first rising slope in the hype cycle. So, during the last months I made some experiments that I now want to share with you.

Not yet an uprising, but the bots are slowly coming… photo: Francesco Mondada, Michael Bonani, CC-SA3.0, Source

„Forget Apps – Bots Are the Future of Geo“ weiterlesen