Author Rank study: Analysis of the top 480 profiles on Google plus Italy

On the 16th of january, Maurizio Ceravolo published a huge study about the top 480 profile of Google+ in italy.

Huge is an understatement since the original article is 23000+ words long and printed on paper would probably become a Novella. I’ve been authorized by Maurizio to adapt in english his discoveries and i’ll do my best to fit the main concepts in a more digestible format.

Who is Maurizio Ceravolo ?

According to his linkedin profile, Maurizio is a web/mobile/windows developer, according to this post is a big data scientist and according to hig G+ shares is very passionate about astronomy.

The study

Maurizio downloaded the data from the 480 most followed profiles on Google+ Italy.
This analysis had several goals:
  • Guessing how author rank may work
  • How to calculate the authorship
  • Who are the most engaging authors
  • Which strategy/action is resulting more engaging to followers.
Processing such an amount of data of course brought some unexpected discoveries and only some of them will be discussed in this first study.
An Excel file is available to download with a lot of aggregated data extracted from the raw data extrated from Google+. You may process it and obtain your own conclusions.

Tools and methods

Maurizio used the Google+ APIs developing his own code to download and store into a database the raw data. Unfortunately the APIs were not able to supply all the data needed, therefore Maurizio developed additional code to download data from the public profiles. Eventually all the data was moved into a database in order to use SQL to aggregate data properly. Excel was used as apresentation tool to show aggregated data. The node graph was rendered using Gephi. An impressive amount of brain power was poured over the whole process.

The representative sample

The starting sample was given by list of the top 400 users in Italy on Circlecount. The geographical pertinence may not be completely accurate because Circlecount read the actual data supplied by the users in their profiles: if a user did not specify “italy” in his/her profile, Circlecount cannot be aware of that. The data was collected on the 12nd of october, therefore the situation may be different in this moment (january 2014).
While analyzing the top 400, Maurizio added 80 more italian people which were followed by the top 400.
I’m listing the top20 but you can read the whole list on the original article. The number represent the amount of followers.
  • +Elia Locardi (1.956.689)
  • +Tiziano Ferro (805.975)
  • +Katie Parla (731.898)
  • +Eros Ramazzotti (391.661)
  • +Ninah Mars (302.966)
  • +Vauro Senesi (266.411)
  • +Andrea Bocelli (245.206)
  • +Daniele Devoti (196.573)
  • +Manu Dolcenera (168.540)
  • +Anna Masera (156.087)
  • +Beppe Grillo (152.036)
  • +Vasco Rossi (143.942)
  • +Luca De Biase (116.897)
  • +Vincenzo Cosenza (112.099)
  • +BAZ Marco Bazzoni (99.860)
  • +Antonio Lupetti (95.567)
  • +Donato Carriero (88.806)
  • +Claudio Gagliardini (78.430)
  • +federico mello (77.325)
  • +Gemma Costa (76.595)

What kind of data was downloaded

For each of the 480 profiles, where available, it was downloaded:
  • the last 100 public posts
  • the text of those posts
  • the names of the people who did +1, commented and shared
  • where possible he downloaded the text of each comment.

Why only 100 posts ?

There is an hard limit enforced by the APIs: you can get 100 posts from a single Api call. Maurizio knows that it’s not an huge number, but it was enough to guess the interactions and behaviours of the people involved during the conversations.
Even with those limits, 2 weeks of work were needed to download all the data and the size of the database became 1.2 GBytes! Since this amount of data was generated by the posts of 480 people, You can imagine the huge amount of storage needed by Google to handle millions of people!

Who are the top 480 accounts

Officially we have 276 men, 180 women, 20 people who left the sex unspecified and 4 that specified “other”. Five of those don’t have a true name, but are using a brand name, for example the Napoli Soccer team (it seems official too!). This is weird because it’s explicity forbidden by Google guidelines and therefore the team should use a Business Page. It’s even weirder because the account is one of the profiles suggested by Google+ on the Sport category!

The plot thickens

A profile +Cesare Riccardo has a doppelganger called +Riccardo Lemons, which has the same avatar, the same profile image and publish the same contents. Why ? He is probably trying to bypass the limit of 5000 people who a profile may follow at the same time. That’s probably a way to involve more people in the activity of circle sharing.
In the list is listed a departed friend, Marco Zamperini, who sadly passed away just a few days after collecting the data.
If you read the whole list, you will notice some names which does not seem italians. They are listed because they were listed as in italy at the time of collection and on average they interact quite a lot with the italian audience.
Comparing the top people on twitter, where the majority of dominant accounts are VIPs, on Google+ there a fair share of common people and the true VIPs are only 10.

Aggregating Data

All the account collected have been manually and categorized and in some cases re-categorized during the manual review of the data. There’s an amount of human interpretation because it’s impossible to categorize objectively the accounts.
The aggregated categories are:
  • Bloggers
  • Circlers: People who try to increase their followers, ignoring the quality of contents. They usually employ shared circles, exchanging visibility by sharing their sharers in a new share circles. It’s confusing but it’s similar to link exchange and it’s probably a form for author rank spam. They often use tags like #circleshare or #sharedcircle.
  • Commercial intent: fake page or sell page
  • Culture: including arts, architecture, law, science, writing
  • VIPs: musicians, politicians, showbusiness, vips in general.
  • Photography: real photographers. There is quite a bunch of such people and one of those is the most top account of Google+ italy!
  • Images: Generic images, facebook-like content, motivationals, love, quotes, etc.
  • Informations: news and journalism
  • Other: for the deviant content that you can’t classify in other categories.
  • Politics: because somebody has to tell his opinion on how to run things
  • Entertainment : an aggregated category including : adult, animals, cooking, exoterism, fashion, music, sport)
  • Web, Social, tech: including also contents from Google and Google employees

Weirdly enough for the italian market, soccer is almost absent on the top accounts: only two people talk about sport and they are almost ignored by their followers. (translator’s note: there’s facebook for rabid soccer fans).

Of the 480 profiles:
  • 89 won’t allow to see the list of the followers
  • 17 follow less than 100 people
  • 67 follow 100-500 people
  • 40 follow 500-1000 people
  • 50 follow 1000-2000 people
  • 46 follow 2000-3000 people
  • 46 follow 3000-4000 people
  • 125 follow 4000-5000 people (which is the maximum amount of people to follow)
There is no limit of on the number of followers.
Those 480 people are following 800.000 people, with a unique number of accounts numbering around 300.000. Definitely not a private club!
Note that the list of following may be incomplete: it’s possible to publish only a part of it, showing publicly only some circles.

The node graph of the relations

Downloading from each account the list of the followed account, is it possible to draw a graph of the relations beetween profiles.
  • The high resolution (4000×5000 px) graph is available here.
  • If you want to try an interactive version, click here.

is it possible to view the interactive version of the graph, but due to the sheer amount of data, you may have to wait even 30 seconds (depending on the connection speed). Definitely not recommended on a smartphone. Where it’s ready, you can click on a node to see the connections to other profiles.
The circle attached to each node has a color (representing the category) and an area size representing the followers amount.
As you can see there are 3 dominant colors: Red (Web, social, tech), light green (images), blue (photography). Those are the most popular categories between the top accounts. Observing the graph, it seems that everybody is well connected , except a couple of outsiders which have no connection.
The Graph was calculated using the algorithm “Force Atlas” between the circles. This algorithm balance the nodes distance according to the number of links between them. The final result is that similar circles have more connections and therefore they are moved close: it make sense because people to tend follow other people with similar interests.

Major discoveries

  • Photographers have created an “enclave” and don’t connect with other groups.
  • Images group has lot of interaction with other groups
  • Web-social-tech group is very connected, probably because we feel the need to relate to each other, but it’s also followed a lot by other groups.
  • Circles are mainly leeching the photographers.
  • Other groups are in the middle.
One of the strange cases inside the VIPs groups is the account belonging to the artist Andrea Bocelli, which is almost neglected by the other top accounts. The account, which is the 7th most followed in italy, is following only 9 accounts.
The most well connected account in our category (tech-web) is Claudio Gagliardini, that being an expert in social media, was able to connect 157 followers from the top 480! Luca Sanarica is the account following the largest amount of Top account, with 328 out of 480.

Outside the TOP but still interesting

There is a list of people who is followed by our Top 480, the top 10 list include: (the number represents how many people from the top is following them)
  • +Robert SKREINER (205)
  • +Larry Page (164)
  • +Marco Scud (130)
  • +Robert Scoble (129)
  • +Sergey Brin (129)
  • +Thomas Hawk (120)
  • +Google Italia (118)
  • +Pete Cashmore (117)
  • +Trey Ratcliff (116)
  • +Alessio Jacona (114)
The most followed by the top is a photographer Robert Skreiner which is surpassing even Larry page. Inside this list we may see two italians Marco Scud and Alessio Jacona which are out of the top 480, demonstrating that it’s not necessary to have lots of followers to get noticed by top players.
Most of the accounts have circled more thatn 1000 people, according to the public data. Some of them even more than 4000. Are those people really that interesting ? Maurizio don’t think so. He is following around 500 people and it’s complicated to read each of those posts. One possible hypothesis is that lots of people are dismissed into minimum-level circles which do not publish content on the main stream.
Why ?
Because people have an inherent need of followers as a currency, a legacy behaviour coming from the times of myspace. They follow new people constantly hoping for a reciprocation. Circling a person generates a notification therefore some people decide di circle back.
It seems unrealistic to use the number of followers as a measure of success and interest about a profile. There’s also a market, like facebook and twitter, to buy followers and increase the magic number. Could it be that the author rank, the mythical beast of Mountain view, is influenced by such number? Hardly so!

Let’s add another level of complexity

Few days ago the “private results” have arrived in italy too. How will it influence our results ? Are we going to see the results suggested by the people we are following or are we going to see the results of the people who have interacted with us ?
Since google has collected interactions throught the +1 button, It would be strange to use only the relations and not the interactions.
That’s the reason why Maurizio has collected the available contents beside the relations…and that’s why you should stay tuned for part TWO !

Announcement

Dejan Seo is hosting an hangout on air on 30th of january with the author of the study and some other italian SEOs (including me).

Second part of the article has been published

follow this link to read it