Why hackers should be afraid of how they write
The write stuff ... Drexel University stylometry researchers Mike Brennan, Ariel Stolerman, Andrew McDonald, Aylin Caliskan Islam, Sadia Afroz and Rachel Greenstadt.
It's been used to question or confirm the authorship of Shakespeare's plays, Homer's Illiad and Odyssey and St Paul's letters for hundreds of years.
Now the science of stylometry could be used in the fight against hackers, trolls and malware writers that wreak havoc on the web.
At the same time, stylometry - the analysis of a person's unique writing style – could also be used by employers to identify whistleblowers or whingers among their staff.
What you say online could be traced back to you using stylometry. Photo: Mikael Altemark/Flickr
"Your writing style can give you away and on the internet anonymity is difficult to achieve," say the US researchers who have developed online tools to analyse writing.
The researchers, from Drexel University in Philadelphia, studied the leaked conversations and contributions of hundreds of anonymous users in underground online forums.
They were able to identify 80 per cent of users using stylometric analysis to match writing styles to authors.
"Most people are not aware how sensitive their writing style can be,” said Sadia Afroz, one of the researchers and a PhD candidate in computer science.
The findings could have repercussions for anyone who doesn't want their writing to be traced back to them. "I read many long anonymous answers... on very confidential stuff [like illegal drug use or confidential information about a prisoner] that can put the writers in danger if their identities are known," she said.
“People share very confidential information thinking that they are anonymous.”
There is also potential for law enforcement and government agencies to use these techniques on underground markets for stolen goods, phishing kits and malware tools - although a large amount of data is required to achieve results.
A minimum of 5000 words was required for analysis to be performed, greatly culling the list of potential targets in the US research. The study, which used the “gold standard” of 6500 words, was presented to an audience at the 29C3 Chaos Communication Congress in Germany in late December.
Hiding behind different anonymous accounts was no longer possible, even if authors used different IP addresses or coded languages such as the Internet alphabet called leetspeak, Ms Afroz said.
Word choice, sentence structure, syntax and punctuation are all giveaways.
Australian linguistic expert Alexis Antonia, from the University of Newcastle's Centre for Literary and Linguistic Computing, said idiosyncrasies were largely unconscious and could create something akin to a verbal fingerprint.
"The relative frequencies of function words constitute a set of readily quantifiable features which distinguish between the texts of different authors," Conjoint Fellow Antonia said.
When applied to underground networks online, the findings could identify malware writers or botnet masters and their topics of discussion from leaked conversations, which are “available in the wild”, according to Ms Afroz.
To counter the implications for privacy and security, the researchers have created two open-sourced tools - the first one, JStylo, recognises an individual writer's style. The second tool, Anonymouth, is used to "anonymise" writing by providing user specific suggestions to change writing style.
The tools can be downloaded online.