In most academic fields, presenting stats and data is key. Words like 'values', 'equations', 'numbers', and 'tests' are common in theses and papers. But how do you use these words; what other words do they usually combine with? In this analysis, we explore what phrases authors use most often when they present data and statistics.

Our analysis

We built a data set of 300 million sentences from published papers. From these sentences, we extracted all three-word combinations following the pattern subject + verb + object (for example, 'data shows difference').

We then collected the 100 most frequent combinations and their frequency, and visualized these (see image below). The 3 most-used triples were 'equation have solution', 'data provide evidence', and 'test show difference'.

Note that all phrases are lemmatized: they reflect the total counts of all forms. For example, the phrase 'test show difference' includes 'tests showing differences', 'tests showed differences', and others. The combined words were also not necessarily adjacent in the original sentence; for instance, an occurrence of 'test show difference' might have been 'test A showed a small difference' in the original paper.


The image below shows the most frequently used word combinations. The subject is shown in bold, the verb in regular script, and the object in italics. The figure uses hierarchical clustering, with the phrases first being grouped by subject and then by verb.

Not surprisingly, ‘data’ is the most frequent subject. It is often combined with the verbs 'provide', 'show', and 'support'. For example, data 'provide' 'evidence', 'information', or 'insights'; data 'show' 'differences', 'increases', and 'correlations'; and data 'support' 'hypotheses', 'notions', and 'ideas'. The subject 'test' is also frequent, and most often followed by 'reveal', 'indicate', or 'show' '(a) difference'.

Next time you’re writing your methods or results section and you’re stuck for words, see if this image helps you! It might give you the words you’re looking for.

About the author

Hilde is Chief Applied Linguist at Writefull.

