Predicting 6 Election results correctly in a tumultuous year – some lessons


Above – system map of the US Election dataswarm in process. Trump is in front – this was at a time the mainstream US media and polling was saying Clinton was far in front, and we then knew an upset was very likely coming.

Using our systems we have managed to predict 5 elections (strictly speaking 6 as the French one is 2 rounds) correctly over the last year, 3 very much against all the polling predictions (Brexit, US, UK 2017) and one that still produced “shock” results (Germany). We went public on 5 of them, before election day in each case. (The bullet points below link to blog posts of each of the individual election prediction summaries)

  • Predicted Trump win – we went public with our prediction 3 days before the election. The polls said Hillary was a shoo-in, but the system saw a close race and a dynamic support base that Trump had built up from the beginning and plumbed for him.
  • Predicted UK election – went public 2 days before. The system predicted a hung parliament, we got one. Polls were predicting a major Tory win.
  • Predicted French Election (both rounds) – went public 2 days before both. Got both right. System underestimated scale of Macron win in second round, but in our review we learned how to more accurately predict future voter flow from the system’s memetic linkages – major learning point bonus!
  • Predicted German Election – went public 2 days before. Predicted AfD surge and CDU crash better than any polls.
  • Brexit was an initial side project, done with a bit of social media sampling and system dynamic modelling,  but as that worked and we got it right – despite all the poll opinions being completely the opposite – we decided to carry on with other elections as as we found that a “known” outcome at a point in time allowed us to calibrate the algorithms quite well.

We find that with commercial work measuring aspects like  “sentiment” or “influence” and similar can give rather nebulous results.  Measuring the true picture is not always simple.  How do these really work, and how do they relate to what we can measure?  Using known outcomes like election would really help calibrate these metrics, and we could show our clients how our system definitely was working correctly. Besides, if we got a few elections right we might open up a new area of business!

Each election taught us something new about the way the algorithms should work, and what needed tweaking or expanding, and we learned some new insights about how human behaviour in reality maps to the metrics you can “see” with social media analytics. We also learned quite a bit about how media works in persuading people, and how “fake news” et al operates. We also proved to ourselves that the system was good in 2 English speaking cultures (US and UK), and could also operate in French & German language and culture – and could crunch tens of millions of units of social media data.

We also had quite a few insights about what is happening at a population level politically, and our views differ quite a lot from a lot of the “conventional wisdom” we saw during and after the elections.

These are some of the high level takes on each election:

Brexit, 2016

We did not put the main system on this, but used a series of small data samples over the election period, and then built a fairly simple system dynamic model to predict outcomes of a 2-horse election race. Turned out it worked, and against all the conventional wisdom and poll predictions to boot.  The Remain camp (the favourites) seemed to go in with a static strategy and refused to shift it when it was clear it was losing. In essence, Remain’s arguments, exaggerated by “Project Fear” style messaging, were increasingly being perceived by neutrals to exaggerate the risks and that led to an increasing resistance to their message and it gave a foothold to the pro-Leave media to start to land some telling blows (with similarly exaggerated, but more positive claims). Cycle this through a few times, increase the hectoring volume as Remain started to panic a bit, and more and more people just switched off to these messages. Also, the increasingly vicious insults on Leave voters (more on insults later in the paper) alienated many people. The sampling showed increasing support for Brexit and it was increasingly clear to us that Remain were increasingly mortally wounded and would lose (Probably – there is always  a range of error in these things).

The US 2017 Election

We tracked the primaries to “train” the main system, and it correctly predicted Trump & Clinton wins, though Sanders was a major challenger to Clinton for a very long time. We don’t count these elections as some of our predictions, as we never predicted them per se (Maybe we should in future).

In tracking the actual Presidential election race from August – Sept 2016, we saw Trump was winning from the beginning, we believed the polls were wrong and suspected another upset was coming. Clinton had caught up a lot by the end, but the system was saying Trump edged it. We thought it was going to be very close, very certainly not a Clinton barnstorm as the polls were all saying. We had seen this in Brexit as well so we decided to trust the system. We gulped and went to press, calling it for Trump, and he won. (Only just, as the system predicted). Why did we see what was right while the polls got it so wrong? We think there were a number of causes,  we think we were getting a more realistic picture via social media, but frankly another reason seemed to be the  pollsters themselves – they seemed unable to believe what the data said.

The UK 2017 Election

The Tories were supposed to have gone into the UK election with a major lead. We saw no huge Tory lead, ever, they were just a bit ahead. In week 1 the LibDems made most of the running but quickly fell away as the main parties got going, After another week or so Labour started to close the gap on the Tories. Labour’s Manifesto was a step change – from about then on Labour started to gain on them much faster. As with Brexit, the Tories seemed to go in with a static strategy and didn’t shift it when it was clear it was losing ground. The only topics to “break the surface” as very influential were Brexit and the NHS, and Scotland in far 3rd place. All others were in the noise, it wasn’t the Economy, stupid. From about 2 weeks before election day our system was predicting a hung parliament. Now here come the embarrassing bit. Us data analytic humans messed it up, we “knew” from the UK 2010 and 2015 elections that social media was still a bit left leaning so we over-compensated for that and called a range, from hung parliament to a stay the same outcome, with a reduced majority  as the most likely outcome. It was a hung parliament, the system was bang-on, and we decide from then on to “trust the system” (We were still far closer than any polls by the way, they were all predicting an enlarged Tory majority)

The French 2017 Election – round One

This was more complex than the previous 3 elections we had tracked in that there were multiple candidates, but with support levels far closer to each other than the US primaries where clear winners emerged. Our system indicated Macron and Le Pen were front runners, Macron had more support than Le Pen, and it was proved right. Another correct call to us. But we had an interesting lesson here. Quite close to the end the system was saying another candidate, Hamon, looked like a strong competitor, even potentially leading Le Pen. The French polls, which are usually very accurate, said he had started well but was fading fast. When we looked at our latest daily data he was indeed not near the front. It turned out his support had fallen over the first round but our algorithms were hanging on to too much history, over-weighting his early high ratings. That was a very useful lesson in algorithm trend tracking calibration!

The French 2017 Election – round Two

It second round was a two horse race, as with Brexit, the US and the UK elections. Easier going for us. Our system indicated Macron was in front, Le Pen was second, and we predicted he would win and we got it right. But, the system said  Le Pen was closer behind than the eventual polls showed. This was the only time in all the elections that we tracked that the polls were more accurate than we were. So we started to backtrack through the system’s calculations and found something very interesting. We found the relationships of other candidates to Macron or Le Pen predicted the relative shift of that candidate’s supporters to the two finalists. Summing the relative shifts of those voters gave a far closer split to the actual election results. We had just stumbled over how to analyse future voter intent. We did not have this problem with Clinton/Sanders, probably because there was more time between primary and final election so voter shifts had largely happened by the time voting started.

The German 2017 Election

Tuning the system using the lessons learned in the French rounds helped hugely, as again it was a multi-horse race which is more complex. We called the German election pretty well, given this challenge. We predicted both the “shock” AfD jump and CDU/CSU drops correctly, against the polls’ predictions. Our overall error % was the same as the polls on the day. Why? Most of our error was due to another “fast faller” – the SPD – and the system, despite our tweaks for Monsieur Hamon in France, still over estimated the early history. and we over-estimated SPD support from this history. More tweaking will be required on this. Another point – we found underlying AfD support is larger than the election would suggest, but that influence on voting outcome was lower than the other parties on the day. We think true AfD support is several % points higher. This will probably come to light once they get into parliament, and get more exposure.  Assuming that is, they don’t fall apart instead – their largest personality (by far), Frauke Petry, left the day they were sworn into the Parliament!

The System

The system’s logic has in our view proved it is dependable over a range of conditions, languages etc, and that the algorithms are seeing and indicating a true picture and are fairly accurate in their predictions.  We have learned a lot about calibrating some of the more vague terms such as sentiment and influence. We had also built our own hardware stack (we found commercial cloud systems too slow / too expensive for this sort of data crunching) so we were quite pleased it held up to the task. We also noted that we got these results not by spending long hours tramping the ground, or the focus group grind, but by our computers crunching tens of millions of social media data items. The only walking we had to do was to the coffee machine.

Political Trends

There is a huge amount of interesting data we now have to look at political trends and insights, and to derive lessons for future elections, and we are going through it all with great interest – but some general points that were immediately apparent are:

  • In all 4 countries, over 6 elections, there are a large number of voters – traditional working class and many traditional “non voters” – who turned out for the candidates who promised a shift away from the status quo. Politicians ignore this trend at their peril.
  • There seems to be a decline in the centre of politics, and a shift to the poles of each political spectrum. Various countries’ electoral systems and parties handles this in various ways but the trend is marked.
  • That “insult” thing we mentioned above – casting ad hominem insults on the other sides’ voters seems to be a bad idea. We noted in all elections, in various ways, that insulting the other sides’ voters just created more of them. So a lesson for future candidates – don’t call the other side’s voters “Deplorables” (and worse) – that is guaranteed to generate more of them.
  • Humans are humans, culture, language, electoral system are lower order variables – 6 elections, 3 languages, 4 cultures, the same approach worked.
  • In similar vein, the system also worked across various alleged levels of “Fake News”, Russian involvement, Electorate manipulation, voting machine rigging, Bot spoiling etc etc, and pretty much delivered a good result each time. The system handles most bots quite well (in our humble opinion), but we are starting to think that – given our algorithms were basically the same and produced the same approximate quality of predictive results across apparently highly varying levels of all these spoilers over a number of elections – that “Fake News”, “The Russians” et al were not the decisive factors in any of these elections that some believe they were.
  • In addition, there is some form of major systemic problem with US and UK polling, the French and German polls mapped to the final results far more closely.

Lots more to come on this aspect over time though…..what we can say is that we disagree with quite a bit of  the “popular” analysis of the political press and pundits about why various things happened, and what it all means going forward.