European Parliament groups and how they vote¶

Based on a set of 3299 votes obtained through VoteWatch.eu we create a dataset for analysis using Pandas. Using initial data visualisation techniques (like heatmaps), we then obtain an euclidean distance matrix between all political groups pairs. A cluster map is created from the distance matrix using Ward clustering, presenting the way the different groups cluster together in a dendogram that shows the relative distance between all of them.

Using DBSCAN and SpectrumClustering in the computed affinity matrix we obtain the separate groups that can be identified by both methods, and using Multi-dimensional scaling we create 2D and 3D maps of the relative distance between all groups - in total and by Policy Area.

Importing data¶

This data was extracted from http://votewatch.eu - a site I recommend to anyone to that wants to follow the political activity in the EU; it consists of ~3300 votes from the last quarter of 2020.

eu_v.head()

	ID	Date	Policy Area	Name	For	Abstentions	Result	GUE-NGL	S&D	Greens/EFA	REG	EPP	ECR	IDG	NI
0	2324	9/14/2020	Transport & tourism	Sustainable rail market in view of COVID-19 ou...	64	0	Adopted	For	For	For	For	For	For	For	For
1	2325	9/14/2020	Budget	Draft amending budget no 8: Increase of paymen...	62	2	Adopted	For	For	For	For	For	For	For	For
2	2326	9/14/2020	Regional development	Proposal for a Council decision authorising Po...	64	1	Adopted	For	For	For	For	For	For	For	For
3	2327	9/14/2020	Culture & education	Effective measures to “green” Erasmus+, Creati...	65	0	Adopted	For	For	For	For	For	Against	Abstain	For
4	2328	9/14/2020	Environment & public health	The EU’s role in protecting and restoring the ...	65	0	Adopted	For	For	For	For	For	For	For	For

Each row is a vote and the columns include the political groups position (as defined by VoteWatch.eu), the result and the policy area, amongst others.

eu_v.columns

Index(['ID', 'Date', 'Policy Area', 'Name', 'For', 'Against', 'Abstentions',
       'Result', 'GUE-NGL', 'S&D', 'Greens/EFA', 'REG', 'EPP', 'ECR', 'IDG',
       'NI'],
      dtype='object')

Looking at the data¶

Information on the political groups can be obtained directly from the European Parliament site (https://www.europarl.europa.eu/about-parliament/en/organisation-and-rules/organisation/political-groups); a very brief description based on the above information and direct quotes (when possible) from their official sites:

Group of the European People’s Party (Christian Democrats): “The EPP Group is the largest and oldest group in the European Parliament. A centre-right group, we are committed to creating a stronger and self-assured Europe, built at the service of its people. Our goal is to create a more competitive and democratic Europe, where people can build the life they want.”
Group of the Progressive Alliance of Socialists and Democrats: “The S&D Group is the leading centre-left political group in the European Parliament and the second largest. Our MEPs are committed to fighting for social justice, jobs and growth, consumer rights, sustainable development, financial market reform and human rights to create a stronger and more democratic Europe and a better future for everyone.”
Renew Europe Group: “There has never been a larger centrist group in the European Parliament. By ending the dominance of the Conservatives and the Socialists, Europeans have given us a strong mandate to change Europe for the better. At a time when the rule of law and democracy are under threat in parts of Europe, our Group will stand up for the people who suffer from the illiberal and nationalistic tendencies that we see returning in too many countries.”
Group of the Greens/European Free Alliance: “The Greens/European Free Alliance is a political group in the European Parliament made up of Green, Pirate and Independent MEPs as well as MEPs from parties representing stateless nations and disadvantaged minorities. The Greens/EFA project is to build a society respectful of fundamental human rights and environmental justice: the rights to self-determination, to shelter, to good health, to education, to culture, and to a high quality of life”
Identity and Democracy Group: “Identity and Democracy (ID) is a new group, which is the fourth largest one in the current European Parliament”; “The Members of the ID Group base their political project on the upholding of freedom, sovereignty, subsidiarity and the identity of the European peoples and nations. They acknowledge the Greek-Roman and Christian heritage as the pillars of European civilisation.”
European Conservatives and Reformists Group: “The ECR Group is a centre-right political group in the European Parliament, founded in 2009 with a common cause to reform the EU based on euro-realism, respecting the sovereignty of nations, and focusing on economic recovery, growth and competitiveness. From its 8 founding Member States with 54 MEPs in 2009, we now have 62 members from 15 EU Member States. The ECR Group is at the forefront of generating forward-looking policy proposals to design a reformed European Union that is more flexible, decentralised and respects the wishes of its Member States. Only an EU that truly listens to its people can offer real solutions to the problems we face today. “
The Left group in the European Parliament - GUE/NGL: “Our group brings together left-wing MEPs in the European Parliament. We stand up for workers, environment, feminism, peace & human rights. What unites us is the vision of a socially equitable and sustainable Europe based on international solidarity. The European Union must become a project of its people and cannot remain a project of the elites. We want equal rights for women and men, civil rights and liberties and the enforcement of human rights. Anti-Fascism and anti-racism are also a strong part of the tradition of left movements in Europe.”

There is an additional group listed in the table: NI, which stands for Non-Inscrits: this isn’t strictly speaking a group but it bundles every MEP that doesn’t belong to a group. As per the wikipedia article (https://en.wikipedia.org/wiki/Non-Inscrits) the current MEPs come from different political backgrounds.

To visualise how the different groups vote an initial approach is a simple heatmap; for that end we subset the dataframe on the political groups only and replace the voting indication with numerical values.

The resulting dataframe is simply a list of voting sessions with a numeric indication of each group’s vote:

votes_hm=eu_v[["GUE-NGL","S&D", "Greens/EFA", "REG", "EPP", "ECR", "IDG", "NI"]]
votes_hmn = votes_hm.replace(["For", "Against", "Abstain", "No political line"], [1,-1,0,0])
votes_hmn

	GUE-NGL	S&D	Greens/EFA	REG	EPP	ECR	IDG	NI
0	1	1	1	1	1	1	1	1
1	1	1	1	1	1	1	1	1
2	1	1	1	1	1	1	1	1
3	1	1	1	1	1	-1	0	1
4	1	1	1	1	1	1	1	1
...	...	...	...	...	...	...	...	...
3294	-1	-1	-1	-1	-1	-1	1	0
3295	1	1	1	-1	-1	0	-1	1
3296	-1	-1	-1	-1	-1	-1	1	-1
3297	-1	-1	-1	-1	-1	-1	1	-1
3298	0	1	1	1	1	1	-1	1

3299 rows × 8 columns

Using Seaborn (https://seaborn.pydata.org/) we can then visualise it.

import matplotlib.pyplot as plt
import matplotlib as mpl
import seaborn as sns


voting_palette = ["#FB6962","#FCFC99","#79DE79"]

fig = plt.figure(figsize=(8,8))
sns.heatmap(votes_hmn,
            square=False,
            yticklabels = False,
            cbar=False,
            cmap=sns.color_palette(voting_palette),
           )
plt.show()

This visualisation alone can provide some initial insights; for example, the IDG seems to abstain more than the rest, and the ECR group appears to vote against more than the average. In general groups in the centre-right to right-wing seem to vote more Against than the others.

Who votes with whom? Determining convergence in voting¶

An initial approach to determine how similar or dissimilar the groups are is simply to determine how many times they have voted exactly in the same way, which is what the following table reflects:

import collections

import numpy as np
pv_list = []
#print("Total voting instances: ", votes_hm.shape[0])

## Not necessarily the most straightforard way (check .crosstab or .pivot_table, possibly with pandas.melt and/or groupby)
## but follows the same approach as before in using a list of dicts
for party in votes_hm.columns:
    pv_dict = collections.OrderedDict()
    for column in votes_hmn:
        pv_dict[column]=votes_hmn[votes_hmn[party] == votes_hmn[column]].shape[0]
    pv_list.append(pv_dict)

pv = pd.DataFrame(pv_list,index=votes_hm.columns)
pv

	GUE-NGL	S&D	Greens/EFA	REG	EPP	ECR	IDG	NI
GUE-NGL	3299	2318	2670	2026	1612	978	940	2433
S&D	2318	3299	2642	2841	2449	1571	1087	2536
Greens/EFA	2670	2642	3299	2370	1943	1184	866	2690
REG	2026	2841	2370	3299	2708	1790	1233	2255
EPP	1612	2449	1943	2708	3299	2162	1504	1947
ECR	978	1571	1184	1790	2162	3299	1916	1265
IDG	940	1087	866	1233	1504	1916	3299	1028
NI	2433	2536	2690	2255	1947	1265	1028	3299

Using a heatmap (but this time for a different purpose and with different options) we can visualise that data in a better way.

fig = plt.figure(figsize=(8,8))
ax = fig.add_subplot()

sns.heatmap(
    pv,
    cmap=sns.color_palette("mako_r"),
    linewidth=1,
    annot = True,
    square =True,
    fmt="d",
    cbar_kws={"shrink": 0.8})
plt.title('European Parliament, identical voting count (2020-08-01 to 2021-01-01)')

plt.show()

From this we can see, for example, that the party with whom GUE/NGL has converged the least is the IDG (and vice-versa), or that S&D converges more with the REG (and vice-versa). This approach, while already useful, only considers the proximity based on absolute convergence - is there a better way?

The Distance Matrix of the political groups¶

One improvement is to reflect the differences in voting behaviour: a party that votes In Favour is closer to a party that Abstains than to one that votes Against. Based on this principle we compute the euclidean pairwise distance between all groups and create a distance matrix.

from scipy.spatial.distance import squareform
from scipy.spatial.distance import pdist
import scipy.spatial as sp, scipy.cluster.hierarchy as hc
from itables import show

votes_hmn = votes_hmn

## Transpose the dataframe used for the heatmap
votes_t = votes_hmn.transpose()

## Determine the Eucledian pairwise distance
## ("euclidean" is actually the default option)
pwdist = pdist(votes_t, metric='euclidean')

## Create a square dataframe with the pairwise distances: the distance matrix
distmat = pd.DataFrame(
    squareform(pwdist), # pass a symmetric distance matrix
    columns = votes_t.index,
    index = votes_t.index
)

distmat

	GUE-NGL	S&D	Greens/EFA	REG	EPP	ECR	IDG	NI
GUE-NGL	0.000000	59.974995	45.155288	69.354164	80.318118	90.972523	89.218832	52.009614
S&D	59.974995	0.000000	48.435524	41.988094	57.323643	77.711003	86.434947	49.879856
Greens/EFA	45.155288	48.435524	0.000000	58.932164	71.833140	86.377080	90.746901	42.000000
REG	69.354164	41.988094	58.932164	0.000000	47.906158	72.166474	83.078276	60.224580
EPP	80.318118	57.323643	71.833140	47.906158	0.000000	60.819405	76.256147	69.598851
ECR	90.972523	77.711003	86.377080	72.166474	60.819405	0.000000	60.149813	82.903558
IDG	89.218832	86.434947	90.746901	83.078276	76.256147	60.149813	0.000000	85.924385
NI	52.009614	49.879856	42.000000	60.224580	69.598851	82.903558	85.924385	0.000000

The findings can be read in a similar way to the previous analysis: for example, the Greens/EFA is closer to the NI group and furthest awy from the IDG, while the EPP is most distant from GUE/NGL and has the REG has the closest group.

This pairwise analysis can fortunately be groups automatically; for this we use Ward clustering to obtain a dendogram that can be combined with a heatmap: a clustermap that has the advantage of automatically reordering the columns and rows to show how the groups are positioned in terms of distance.

## Perform hierarchical linkage on the distance matrix using Ward's method.
distmat_link = hc.linkage(pwdist, method="ward", optimal_ordering=True)

sns.clustermap(
    distmat,
    annot = True,
    cmap=sns.color_palette("Greens_r"),
    linewidth=1,
    #standard_scale=1,
    row_linkage=distmat_link,
    col_linkage=distmat_link,
    figsize=(10,10)).fig.suptitle('European Parliament, euclidean distance and Ward clustering \n(2020-08-01 to 2021-01-01), Clustermap')

plt.show()

The results are much more readable: we can clearly see that:

The first split separates the IDG and ECR from the rest.
The next split separates the EPP, REF and S&D (the last two closer together)
Finally the GUE/NGL, the Greens/EFA and the NI constitute a separate branch (with GUE/NGL branching out first)

DBSCAN and Spectrum Clustering¶

An additional line of inquery is to determine, based on the relative affinity, how many groups can be identified, or how do the parties cluster when divided by a fixed number of clusters.

The first step in answering this is to compute the affinity matrix from the distance matrix. We start by normalising the distance matrix.

import numpy as np

distmat_mm=((distmat-distmat.min().min())/(distmat.max().max()-distmat.min().min()))*1
pd.DataFrame(distmat_mm, distmat.index, distmat.columns)

	GUE-NGL	S&D	Greens/EFA	REG	EPP	ECR	IDG	NI
GUE-NGL	0.000000	0.659265	0.496362	0.762364	0.882883	1.000000	0.980723	0.571707
S&D	0.659265	0.000000	0.532419	0.461547	0.630120	0.854225	0.950121	0.548296
Greens/EFA	0.496362	0.532419	0.000000	0.647802	0.789614	0.949485	0.997520	0.461678
REG	0.762364	0.461547	0.647802	0.000000	0.526600	0.793278	0.913224	0.662008
EPP	0.882883	0.630120	0.789614	0.526600	0.000000	0.668547	0.838233	0.765054
ECR	1.000000	0.854225	0.949485	0.793278	0.668547	0.000000	0.661187	0.911303
IDG	0.980723	0.950121	0.997520	0.913224	0.838233	0.661187	0.000000	0.944509
NI	0.571707	0.548296	0.461678	0.662008	0.765054	0.911303	0.944509	0.000000

We can now obtain the affinity matrix.

affinmat_mm = pd.DataFrame(1-distmat_mm, distmat.index, distmat.columns)
affinmat_mm 

	GUE-NGL	S&D	Greens/EFA	REG	EPP	ECR	IDG	NI
GUE-NGL	1.000000	0.340735	0.503638	0.237636	0.117117	0.000000	0.019277	0.428293
S&D	0.340735	1.000000	0.467581	0.538453	0.369880	0.145775	0.049879	0.451704
Greens/EFA	0.503638	0.467581	1.000000	0.352198	0.210386	0.050515	0.002480	0.538322
REG	0.237636	0.538453	0.352198	1.000000	0.473400	0.206722	0.086776	0.337992
EPP	0.117117	0.369880	0.210386	0.473400	1.000000	0.331453	0.161767	0.234946
ECR	0.000000	0.145775	0.050515	0.206722	0.331453	1.000000	0.338813	0.088697
IDG	0.019277	0.049879	0.002480	0.086776	0.161767	0.338813	1.000000	0.055491
NI	0.428293	0.451704	0.538322	0.337992	0.234946	0.088697	0.055491	1.000000

We will use Density-based spatial clustering of applications with noise (DBSCAN) to as our data clustering algorithm.

from sklearn.cluster import DBSCAN

dbscan_labels = DBSCAN(eps=1.1).fit(affinmat_mm)
dbscan_labels.labels_
dbscan_dict = dict(zip(distmat_mm,dbscan_labels.labels_))
dbscan_dict

{'GUE-NGL': 0,
 'S&D': 0,
 'Greens/EFA': 0,
 'REG': 0,
 'EPP': 0,
 'ECR': -1,
 'IDG': -1,
 'NI': 0}

We get a simple split that identified the ECR and the IDG on one side, and all the others grouped together on the other cluster.

A different approach is to use Spectral Clustering, an algorithm that can be initialised with a pre-determine naumber of clusters; here we set it at 3.

from sklearn.cluster import SpectralClustering
sc = SpectralClustering(3, affinity="precomputed",random_state=2020).fit_predict(affinmat_mm)
sc_dict = dict(zip(distmat,sc))

print(sc_dict)

{'GUE-NGL': 2, 'S&D': 0, 'Greens/EFA': 2, 'REG': 0, 'EPP': 0, 'ECR': 1, 'IDG': 1, 'NI': 2}

The results are consistent with what one would expect when looking at the previous clustermap:

One group with the ECR and IDG
One group with the S&D, REG and EPP
One group with the GUE/NGL, Greens/EFA and NI

Multidimensional Scaling¶

Based on what we’ve done above we can now visualise the relative distances between all the groups in a map: this can be achieved by Multi-dimensional scaling, a method that reduces the dimensions while keeping the relative distances.

What this means is that we can reduce to 2 or 3 dimensions and obtain a plot of how close the parties are that maintains the relative distance; we can also use the information obtained from Spectral Clustering in the form of the colours od the data points, thus combining relative distance and clustering.

from sklearn.manifold import MDS

mds = MDS(n_components=2, dissimilarity='precomputed',random_state=2020, n_init=100, max_iter=1000)

## We use the normalised distance matrix but results would
## be similar with the original one, just with a different scale/axis
results = mds.fit(distmat_mm.values)
coords = results.embedding_
coords
## Graphic options
sns.set()
sns.set_style("ticks")

fig, ax = plt.subplots(figsize=(8,8))

plt.title('European Parliament, MDS \n(2020-08-01 to 2021-01-01)', fontsize=14)

for label, x, y in zip(distmat_mm.columns, coords[:, 0], coords[:, 1]):
    ax.scatter(x, y, c = "C"+str(sc_dict[label]), s=250)
    ax.axis('equal')
    ax.annotate(label,xy = (x-0.02, y+0.025))
plt.show()

This view is perhaps one of the most useful in getting an overview of how the political groups relate to each other based on their voting records.

The 3D equivalent can be seen here:

from sklearn.manifold import MDS
import mpl_toolkits.mplot3d
import random
mds = MDS(n_components=3, dissimilarity='precomputed',random_state=2020, n_init=100, max_iter=1000)

## We use the normalised distance matrix but results would
## be similar with the original one, just with a different scale/axis
results = mds.fit(distmat_mm.values)
coords = results.embedding_
coords
## Graphic options
sns.set()
sns.set_style("ticks")


fig = plt.figure(figsize=(10,10))
ax = fig.add_subplot(111, projection='3d')

fig.suptitle('European Parliament, MDS \n(2020-08-01 to 2021-01-01)', fontsize=14)
ax.set_title('MDS with Spectrum Scaling clusters (3D)')

for label, x, y, z in zip(distmat_mm.columns, coords[:, 0], coords[:, 1], coords[:, 2]):
    #ax.scatter(x, y, c = "C"+str(sc_dict[label]), s=250)
    ax.scatter(x, y, z, c="C"+str(sc_dict[label]),s=250)
    annotate3D(ax, s=str(label), xyz=[x,y,z], fontsize=10, xytext=(-3,3),
               textcoords='offset points', ha='right',va='bottom')  
plt.show()

/tmp/ipykernel_134885/115050311.py:15: MatplotlibDeprecationWarning: 
The M attribute was deprecated in Matplotlib 3.4 and will be removed two minor releases later. Use self.axes.M instead.
  xs, ys, zs = proj_transform(xs3d, ys3d, zs3d, renderer.M)

MDS per Policy Area¶

Finally we can apply the 2D MDS and clustering to each individual Policy Area; the approach is the same but applied to a subset of the votes, providing the relative distance of the parties in the different domains.

for area in eu_v["Policy Area"].unique():
    varea=eu_v[eu_v["Policy Area"] == area]
    avotes_hm=varea[["GUE-NGL","S&D", "Greens/EFA", "REG", "EPP", "ECR", "IDG", "NI"]]
    avotes_hmn = avotes_hm.replace(["For", "Against", "Abstain", "No political line"], [1,-1,0,0])
 
    avotes_t = avotes_hmn.transpose()
    apwdist = pdist(avotes_t, metric='euclidean')
    adistmat = pd.DataFrame(
        squareform(apwdist), # pass a symmetric distance matrix
        columns = avotes_t.index,
        index = avotes_t.index)
    adistmat_mm=((adistmat-adistmat.min().min())/(adistmat.max().max()-adistmat.min().min()))*1
    
    aaffinmat_mm = pd.DataFrame(1-distmat_mm, distmat.index, adistmat.columns)

    asc = SpectralClustering(3, affinity="precomputed",random_state=2020).fit_predict(aaffinmat_mm)
    asc_dict = dict(zip(adistmat,asc))   
    
    amds = MDS(n_components=2, dissimilarity='precomputed',random_state=2020, n_init=100, max_iter=1000)
    aresults = amds.fit(adistmat_mm.values)
    acoords = aresults.embedding_
    
    sns.set()
    sns.set_style("ticks")

    fig, ax = plt.subplots(figsize=(8,8))

    plt.title(area, fontsize=14)

    for label, x, y in zip(adistmat_mm.columns, acoords[:, 0], acoords[:, 1]):
        ax.scatter(x, y, c = "C"+str(asc_dict[label]), s=250)
        #ax.scatter(x, y, s=250)
        ax.axis('equal')
        ax.annotate(label,xy = (x-0.02, y+0.025))
    plt.show()