Dataswarm memetic analysis of German Election
We have been tracking the German election using social media since February, and 6.8 million tweets later we have some predictions. Our Data Analytics Engine (which had correctly predicted Brexit, the US Election and the latest British one against the polls’ opinions) is now predicting for Germany….well, pretty much what the German polls are saying – except for the AfD.
As of today (Friday 21st, 2 days before the election), our system is predicting the following outcomes:
(Note – here, the % quoted is the relative % of the 7 parties considered, these 7 sum to 100%.)
(*Update – to convert to predicted election outcome we assume 95% share for these parties, actual vote share will be the rightmost figure in the brackets*)
AfD 16% – but a wide range variance of c 12 – 20% (Predicted Election outcome = 15%)
CDU 26% (Predicted Election outcome = 25%)
CSU 8% (CDU + CSU = 34%) (Predicted Election outcome = 8%, CDU + CSU = 33%)
FDP 7% (Predicted Election outcome = 7%)
Grüne 7% (Predicted Election outcome = 7%)
Linke 9% (Predicted Election outcome = 8%)
SPD 27% (Predicted Election outcome = 25%)
The range is typically +/- 2% (except the AfD, as discussed below)
The usual caveats
Note that German Twitter usage is relatively low as a % of population, so (as was true in UK and US in the low penetration years) it may still be left leaning if it is still mainly early adopters, so our predicted SPD and Linke figures may be a little high by a few %. Ditto, the FDP figure could be under-represented.
But the real question mark is over the AfD. Why is our system showing a most probable result so much higher than the polls, and saying there is a wider range. (Incidentally, it shows there is even a remote chance it could be greater than 20% so could there be another upset?)
Or is it just wrong? Prediction is always difficult, especially about the future, as they say
Why is the system predicting the AfD vote is higher than the polls?
Over the months that we’ve been tracking the election we have seen the following:
- The AfD have been mentioned specifically or generally in over 40%+ of all the tweets, and over 50%+ in the last 4 weeks, as there has been a marked increase spike. The official polls are also picking up a rise in popularity.
- The system is generating a c 16% share of the vote for AfD vs all the other parties, as above. This is a low ratio compared to the others.
- The first thought is – bots! However, we don’t see a huge amount of bots in this election – c 5-10% were obvious bots, and our systems scrub the vast majority anyway, so we can rule out a huge volume of low grade bot tweets. Our friends at the Oxford Internet Institute have also found low Bot activity in the German election.
Our system has been fairly accurate so far, but there are some oddities in this particular sample though
- Despite this raw volume our system calculates they have a slightly lower total reach than the CDU and SPD (see relative sizes of bubbles in attached graph above) so the average participant is far less connected in general.
- This large amount of activity is also significantly less influential (in the diagram above, they are situated quite far to the left of the other two main parties on a log scale x axis) so they are also not connected to the “right people”
- German Twitter usage is relatively low volume for the population overall, we just may not have a representative sample (though the predictions for the other parties are pretty close to the polls)
So why are the polls lower?
Two options – (i) our system is just wrong, or (ii) its is seeing the correct value, and the polls are under-reporting. In all elections we have tracked so far it has been the latter case with alternative parties so we are inclined to back it this time too. There are a number of factors that could be driving this:
Firstly, it could be the people – the “shy AfD voters” – don’t want to tell the pollsters what they think as:
- A large volume of the tweets mentioning the AfD are negative. This was the same as with Trump and Brexit and Corbyn and Le Pen and didn’t put off voters, but in the German context it may matter more and dissuade voters from giving their views.
- The amount of mainstream media reporting of the AfD seems to be relatively low compared to its reporting of the other parties, so there is not an impression of scale in “the bubble” of pollsters, pundits and politicos (this “bubble belief system” was also a big factor driving underestimation in Brexit and the US elections, and that was despite far more media attention)
- What mainstream reporting there is, is also largely negative with some fairly negative accusations and associations. This again may suppress poll results.
Secondly, it may be that the AfD story is just not having impact because it has no really heavyweight champions
- As noted above, the amount of mainstream media reporting of the AfD is relatively low, and mainly hostile. That is different to all the other elections we’ve watched, where the “alternative party” was well reported in the media and typically also positively championed by one or more mainstream media organs.
- The AfD has no really high profile personalities to lead the charge – no Trump, Farage, Corbyn, Le Pen etc who were big names and led the other upsets from the front. If one ranks Merkel’s influence at 100%, only Petry (no longer leader) ranked even near (c 70%), the others are all in the minor marks and don’t show up and all are far less influential.
In brief, the system has seen the “right” picture in all previous elections, and we are betting it will do again. Our ingoing assumption is the the “Shy AfD voter” effect is in play, because it has been before.
Could we be wrong and the polls right?
Yes, we could be wrong, because some conditions are different as noted above and it may be that despite the support we are seeing those “Shy AfD voters” just will not come out to vote in the numbers the system is predicting they will:
- As noted above, previous campaigns have had high profile leaders and media champions validating the vote choice of the “alternate” movements. Do the grassroots need the leaders to make them want to come out, or will they come out on their own?
- In Brexit, the US and 2017 UK elections the “incumbents” ran poor campaigns, in France, President Macron ran a good campaign and garnered a large amount of support of the voters “floating” between the 1st and 2nd In Germany Angela Merkel has run a fairly good campaign. Will this reduce the voters will to come out and vote “anyone but X”
So the question is which comes first – the voters or the leaders? Will this body of voters still come out and vote without any high profile leaders and major media organs spurring them on, and an actively hostile media against them? Does the AfD have a “ground game” to get them out – or will the “Shy AfD supporters” decide its just better to stay home.
Could there be a shock?
So far in our experience, a volume of tweets/tweeters this large with such low reach and influence means either lots of bots, or a grassroots movement of very unimportant but very active people.
- However, we don’t see a huge amount of bots in this election – c 5-10% were obvious bots, and nor do others looking at it.
- So, if they are not bots, chances are this is a high rate of grassroots activity of a less large number of unimportant people. But as they are so motivated, will they come out and vote? Or is it just all talk and meme-passing?
We haven’t seen a volume this large be so low impact in a political setting before. But it is coming from a low base, and with these volumes small changes in the other factors drive a very large variance in the % vote share in the model. If a few things go “right” the vote % rises quite fast when you model this situation. If things “go right” in real life, the vote % could be quite a lot bigger than our base prediction.
In short, we think the AfD result is likely to be a few % higher than the polls say, and if there is a shock, we won’t be shocked, and nor will you be now.
Incidentally, this sort of conundrum is exactly what led us to tracking elections in the first place, as there is a known end result that one can then use to trace back and calibrate the system’s algorithms for the more usual role of tracking of predicting for customer insight and intent etc.