Two Master’s students won the prestigious SemEval-2016 competition
Huge amounts of posts in social media are generated every day. People share their opinions and sentiments about what is going on in the world. In order to evaluate these opinions automatically computers can interpret and 'understand' the content and also, in particular, the attitude of a writer. At one of the most important international competitions in text sentiment analysis two ETH Master's students designed the most accurate algorithm worldwide. Jan Deriu and Maurice Gonzenbach won the 2016 SemEval text sentiment classification competition, placing first out of 34 teams from all over the world.
The task of the competition was to assign ten thousand English text messages from the micro-blogging service Twitter automatically to the sentiment classes 'positive', 'neutral' and 'negative'. For this purpose, Jan and Maurice designed a neural network. Firstly, they trained it on a large data set of hundreds of millions of tweets, which had positive or negative smileys in the text. Thereby the computer could automatically find correlations between patterns in the text and the presence of a positive or negative smiley, respectively, which can convey the corresponding sentiment. In this way, inherent rules evolved by going through large amounts of data.
At the competition, the accuracy of their algorithm was compared to ten thousand hand-annotated tweets. Jan and Maurice reached an accuracy of 63 percent compared to the manual labels – and outperformed all other teams from more than 20 countries. This value might not sound impressive, however human raters typically agree on the sentiment of text in not more than 80 percent of cases. This illustrates how big a task it is for computers to get this right, in particular since tweets are very short and the language used is informal, often with slang and creative spelling. "We are really proud of Jan and Maurice' achievement," says Martin Jaggi, one of the supervisors of the Master’s students. "The competition is like an IQ test for the participating artificial intelligence systems."
In June 2016, Jan and Maurice will be invited to present their results at one of the leading conferences in computational linguistics in San Diego. They are already preparing for the next competition: to find out whether a text is written for example by a thirteen-year-old girl or a fifty-year-old man. The approach will be the same: training an algorithm on a large set of text documents annotated by age and gender of the writer.