Non-bacterial components of the primate microbiome:
Against a bacterial bias

🖨︎ Printable version (PDF) LaTeX source
ANTH 438 Primate Life History Evolution
University of Illinois at Urbana-Champaign
Dr. Kathryn B. H. Clancy


Since the discovery that the world in which we live is entirely filled with organisms too small to see with the eye, biologists have drawn a line between the microorganism and the multicellular organism. However, the search for the evolutionary apex of cellular organisms has blurred this line, from the theory of lateral gene flow (Woese 1998) to the discovery that archaea and eukarya share more in common with each other than either shares with bacteria (Woese and Fox 1977). In researching the microbiome – the physical commensal relationship between animals and their microbial inhabitants – it is important to consider the full diversity of microbial taxa. This paper seeks to investigate biases in microbiome research toward specific microbial taxa and seek out how the less-studied varieties of microbial inhabitants may have effects unaccounted for.

Despite a growth of research interest in archaea since its identification by Woese and Fox (1977), little research has been conducted into their relationships to other organisms, such as humans. As the domain was originally understood as a clade of extremophiles due to their original isolation from geothermal vents (DeLong and Pace 2001), it would be expected that archaea would be a minor part of more familiar environments; however, it is now known that members of archaea exist with great frequency in non-extreme environments, such as plankton, lakes, and soils (DeLong and Pace 2001). Regardless, research into the composition of human microbiota retains a strong bias toward bacteria (Lloyd-Price, Abu-Ali, and Huttenhower 2016; Huttenhower et al. 2012). Comparative primatology has often failed to expand microbiota research beyond bacteria, frequently excluding unknown or infrequent phyla from results of DNA extraction (Garber et al. 2019; Mallott and Amato 2018; Orkin et al. 2019; Orkin, Webb, and Melin 2019). At its extreme, the term “microbe” has been used synonymously with the term “bacteria” despite the far more precise definition of the latter (for an example, see Mallott and Amato 2018).

Analysis of previous research

In order to assay the relative biases of microbiome research it would be necessary to compare the references to microbial phyla in published literature to the prevalence of such microbes in microbiomes. As much research into the human microbiome is undertaken from a medical perspective, NCBI PubMed contains a wealth of full-text and abstracts in its catalog, which is easily indexed for specific terms and phrases. Thus, NCBI PubMed’s article archive was chosen as a sample for language analysis.

Barplot showing references on logarithmic scale, color-coded by domain. In order from greatest to fewest references: Bacteria (nonspecific); Proteobacteria; Firmicutes; Chordata; Bacteroidetes; Viruses (nonspecific); Ascomycota; Actinobacteria; Eukaryota (nonspecific); unclassified bacterial viruses; dsDNA viruses, no RNA stages; sRNA viruses; Streptophyta; Apicomplexa; Arthropoda; Fusobacteria; Retro-transcribing viruses; Nematoda; Euryarchaeota; Spirochaetes; Verrucomicrobia; Tenericutes; Basidiomycota; Chlamydiae; Chloroflexi; Platyhelminthes; Chytridiomycota.
Figure 1. References to phyla in a PubMed search for the phrase “human microbiome.” References less specific than phylum are recorded as “domain (nonspecific),” domains are presented as colors. All references to humans were excluded, and would have been sorted into Chordata.


A search for the exact phrase “human microbiome” was conducted on NCBI PubMed in October 2019, fetching 1,779 results, which were downloaded as an XML file. Using R version 3.6.1 (R Core Team 2019) on Microsoft Windows 10 18362.418, these results were indexed with pubmed.mineR version 1.0.16 (Rani, Ramachandran, and Shah 2019) and queried for metadata from the NCBI PubTator service (Wei et al. 2019). PubTator analyzes the text of PubMed abstracts and full-text articles to determine the names of species and taxonomic tanks mentioned in such articles. Excluding mentions to Homo sapiens, the taxonomic ranks were looked up in the NCBI Taxonomy Database (NCBI Taxonomy Database 2019) using ncbitax2lin (Xue 2019) to convert the database to CSV.

References to taxonomic ranks more specific than phylum were simplified to phyla, and less specific references were recorded as “domain (nonspecific).” In the case of viruses, which have separate ranking system, the NCBI value “no rank – 0” was used in place of phylum. The number of references to each phylum was totaled and categorized by domain. All R code executed is presented in Appendix A.


Of the 1,779 articles indexed, 1,730 mentioned a taxonomic rank or species. This totaled 5,354 individual mentions, of which 2,419 were to species or ranks other than humans. As visible in Figure 1, bacteria made up the majority of references, with the phyla Proteobacteria and Firmicutes second only to nonspecific references to bacteria as a domain. Eukarya were second to bacteria in total number of references; however, the most frequently discussed phylum within the domain was Chordata, which likely stems from discussion of vertebrate microbiomes such as Mus musculus rather than indicating the presence of chordates in the human microbiome. Viruses were referenced moderately frequently, with nonspecific virus references making up the greatest proportion of the domain’s coverage, followed by bacteriophages. Finally, archaea were referenced least frequently of all, with nine total references amounting to less than any viral phylum.

The Role of Non-Bacterial Organisms in the Microbiome


Yeasts have been found to colonize infant guts gradually, with an isolation rate of 13% at 28-46 days old (Benno, Sawada, and Mitsuoka 1984) and 50% as soon as four months of age (Ellis-Pegler, Crabtree, and Lambert 1975). Notably, this rate of colonization was consistent between breast-fed and formula-fed infants, in contrast to the colonization rates of bacterial phyla which differ significantly between the two feeding mechanisms. From a life-history perspective, this presents a novel question of how yeast colonize the infant gut and whether the mechanism by which breastmilk contains microbiota associated with the mother’s gut microbiome (Martin and Sela 2013) is also capable of providing yeast, whose eukaryotic cells differ in size by orders of magnitude from prokaryotic cells.

However, research conducted on yeast inhabitants of animals focuses primarily on their pathological role. The genus Candida is well known for its role in infectious disease in humans, particularly in the immunocompromised (Kourkoumpetis et al. 2011). However, species of yeast, including many Candida spp., have frequently been identified on the human body, including the vagina, gut, and skin, without any associated mycosis (Manolakaki et al. 2010). Even for species known to be infectious agents, between 12% (Chow et al. 1986) and 80% (Soebel 2007) of asymptomatic healthy women are vaginal carriers. In place of viewing yeast through the lens of infection, it should be considered as a part of the ecosystem that makes up the human body. How it reaches and colonizes infants, and why its prevalence makes up far less of the population than do bacterial commensals, merits further study.


In 1966 it was discovered that humans both respire and flatulate methane, which is not known to be biologically produced by any bacteria or eukaryote. Following this observation, methanogenic prokaryotes, from what would later be called the archaea, were identified in human feces (Nottingham and Hungate 1968). The archaeal domain remains the only taxon with members known to be biological producers of methane (Thauer and Shima 2006), and as such is unique in its residence within the human gut. Despite widespread colonization by a diverse array of archaeal species, the ecological niche which they occupy within the human body and whether they are transferred vertically or environmentally remains unknown (Dridi, Raoult, and Drancourt 2011). It has been theorized that many more lineages of archaea remain undiscovered, as the DNA analysis methods used to identify bacterial genomes are incompatible with archaeal cell walls (Horz 2015).


Despite not fitting many definitions of biological life, viruses are the most diverse form of biological material on the planet. They are known to outnumber bacteria by a factor of nearly ten (Oren, Bratbak, and Heldal 1997) and are a major driver of evolution in prokaryotes. Although humans are susceptible to a wide variety of viruses, the viral community in the microbiome, so fast it has itself been called a virome (Wylie, Weinstock, and Storch 2012), is dominated primarily by bacteriophages whose sequences are novel (Dutilh et al. 2014). The phages identifiable by genetic sequence are primarily those known to attack bacterial phyla associated with the microbiome, yet appear to be vastly more diverse than the virome of the ocean (Waller et al. 2014). Unlike the theories of bacterial colonization, how such a vast and dynamic virome is able to establish itself by adulthood remains unknown. The effect of age, and whether milk carries a virome of similar diversity would be intriguing avenues for further study.


Utilizing both life-history and ecological frameworks for approaching non-bacterial components of the microbiome, it is clear that the roles such taxa play within our bodies is underexplored. While this paper sought to touch on these issues, it is limited by both methodology and its brevity. The use of NCBI PubMed for language analysis suffers from a bias toward medical research, and so microbiome research will often consider the roles of known pathological agents rather than in-depth discussions of all possible inhabitants. Eukarya is also a domain of organisms highly diverse in physical makeup and metabolic strategy, while only yeasts were discussed here. The roles of other unicellular parasites (for instance, Toxoplasma gondii) and of helminthic worms, which each form diverse monophyletic groups, can also be explored via the lens of the microbiome’s life-history. More research is needed in this area to investigate how the diverse community of microbes within humans evolved and what role it plays even in the absence of pathology.


  1. Benno, Yoshimi, Ken Sawada, and Tomotari Mitsuoka. 1984. “The Intestinal Microflora of Infants: Composition of the Fecal Flora in Breast-Fed and Bottle-Fed Infants.” Microbiology and Immunology 28 (9): 975–86. doi:10.1111/j.1348-0421.1984.tb00754.x.

  2. Chow, Anthony W., Robin Percival-Smith, Karen H. Bartlett, Anita M. Goldring, and Brenda J. Morrison. 1986. “Vaginal Colonization with Escherichia Coli in Healthy Women: Determination of Relative Risks by Quantitative Culture and Multivariate Statistical Analysis.” American Journal of Obstetrics and Gynecology 154 (1): 120–26. doi:10.1016/0002-9378(86)90406-0.

  3. DeLong, Edward F., and Norman R. Pace. 2001. “Environmental Diversity of Bacteria and Archaea.” Systematic Biology 50 (4): 470–78. doi:10.1080/10635150118513.

  4. Dridi, Bédis, Didier Raoult, and Michel Drancourt. 2011. “Archaea as Emerging Organisms in Complex Human Microbiomes.” Anaerobe 17 (2): 56–63. doi:10.1016/j.anaerobe.2011.03.001.

  5. Dutilh, Bas E., Noriko Cassman, Katelyn McNair, Savannah E. Sanchez, Genivaldo G. Z. Silva, Lance Boling, Jeremy J. Barr, et al. 2014. “A Highly Abundant Bacteriophage Discovered in the Unknown Sequences of Human Faecal Metagenomes.” Nature Communications 5: 4498. doi:10.1038/ncomms5498.

  6. Ellis-Pegler, R. B., C. Crabtree, and H. P. Lambert. 1975. “The Faecal Flora of Children in the United Kingdom.” Epidemiology & Infection 75 (1): 135–42. doi:10.1017/S002217240004715X.

  7. Garber, Paul Alan, Elizabeth K. Mallott, Leila M. Porter, and Andres Gomez. 2019. “The Gut Microbiome and Metabolome of Saddleback Tamarins (Leontocebus Weddelli): Insights into the Foraging Ecology of a Small‐bodied Primate.” American Journal of Primatology Upcoming: e23003. doi:10.1002/ajp.23003.

  8. Horz, Hans-Peter. 2015. “Archaeal Lineages Within the Human Microbiome: Absent, Rare, or Elusive?” Life 5 (2): 1333–45. doi:10.3390/life5021333.

  9. Huttenhower, Curtis, Dirk Gevers, Rob Knight, Sahar Abubucker, Jonathan H. Badger, Asif T. Chinwalla, Heather H. Creasy, et al. 2012. “Structure, Function and Diversity of the Healthy Human Microbiome.” Nature 486: 207–14. doi:10.1038/nature11234.

  10. Kourkoumpetis, Themistoklis K., George C. Velmahos, Panayiotis D. Ziakas, Emmanouil Tampakakis, Dimitra Manolakaki, Jeffrey J. Coleman, and Eleftherios Mylonakis. 2011. “The Effect of Cumulative Length of Hospital Stay on the Antifungal Resistance of Candida Strains Isolated from Critically Ill Surgical Patients.” Mycopathologica 171 (2): 85–91. doi:10.1007/s11046-010-9369-3.

  11. Lloyd-Price, Jason, Galeb Abu-Ali, and Curtis Huttenhower. 2016. “The Healthy Human Microbiome.” Genome Medicine 8: 51. doi:10.1186/s13073-016-0307-y.

  12. Mallott, Elizabeth K., and Katherine R. Amato. 2018. “The Microbial Reproductive Ecology of White-Faced Capuchins (Cebus Capucinus).” American Journal of Primatology 80 (8): e22896. doi:10.1002/ajp.22896.

  13. Manolakaki, Dimitra, George C. Velmahos, Themistoklis Kourkoumpetis, Yuchiao Chang, Hasan B. Alam, Marc M. De Moya, and Eleftherios Mylonakis. 2010. “Candida Infection and Colonization Among Trauma Patients.” Virulence 1 (5): 367–75. doi:10.4161/viru.1.5.12796.

  14. Martin, Melanie A., and David A. Sela. 2013. “Infant Gut Microbiota: Developmental Influences and Health Outcomes.” In Building Babies: Primate Development in Proximate and Ultimate Perspective, edited by Kathryn B. H. Clancy, Katie Hinde, and Julienne N. Rutherford, 233–56. New York: Springer. doi:10.1007/978-1-4614-4060-4_11.

  15. NCBI Taxonomy Database. 2019. National Institutes of Health.

  16. Nottingham, P. M., and R. E. Hungate. 1968. “Isolation of Methanogenic Bacteria from Feces of Man.” Journal of Bacteriology 96 (6): 2178–9.

  17. Oren, Aharon, Gunnar Bratbak, and Mikal Heldal. 1997. “Occurrence of Virus-Like Particles in the Dead Sea.” Extremophiles 1 (3): 143–49. doi:10.1007/s007920050027.

  18. Orkin, Joseph Daniel, Fernando A. Campos, Monica S. Myers, Saul E. Cheves Hernandez, Adrián Guadamuz, and Amanda D. Melin. 2019. “Seasonality of the Gut Microbiota of Free-Ranging White-Faced Capuchins in a Tropical Dry Forest.” The ISME Journal 13 (1): 183–96.

  19. Orkin, Joseph Daniel, Shasta Ellen Webb, and Amanda Dawn Melin. 2019. “Small to Modest Impact of Social Group on the Gut Microbiome of Wild Costa Rican Capuchins in a Seasonal Forest.” American Journal of Primatology Upcoming: e22985. doi:10.1002/ajp.22985.

  20. Rani, Jyoti, S. Ramachandran, and Ab. Rauf Shah. 2019. pubmed.mineR: Text Mining of Pubmed Abstracts. Package version 1.0.16.

  21. R Core Team. 2019. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing.

  22. Soebel, Jack D. 2007. “Vulvovaginal Candidosis.” The Lancet 369 (9577): 1961–71. doi:10.1016/S0140-6736(07)60917-9.

  23. Thauer, Rudolf K., and Seigo Shima. 2006. “Methane and Microbes.” Nature 440: 878–79. doi:10.1038/440878a.

  24. Waller, Alison S., Takuji Yamada, David M. Kristensen, Jens Roat Kultima, Shinichi Sunagawa, Eugene V. Koonin, and Peer Bork. 2014. “Classification and Quantification of Bacteriophage Taxa in Human Gut Metagenomes.” The ISME Journal 8: 1391–1402. doi:10.1038/ismej.2014.30.

  25. Wei, Chih-Hsuan, Alexis Allot, Robert Leaman, and Zhiyong Lu. 2019. “PubTator Central: Automated Concept Annotation for.” Nucleic Acids Research 47 (W1): W587–W593. doi:10.1093/nar/gkz389.

  26. Wickham, Hadley, Jim Hester, and Romain Francois. 2018. readr: Read Rectangular Text Data. Package version 1.3.1.

  27. Woese, Carl Richard. 1998. “The Universal Ancestor.” Proceedings of the National Academy of Sciences 95 (12): 6854–9. doi:10.1073/pnas.95.12.6854.

  28. Woese, Carl Richard, and George Edward Fox. 1977. “Phylogenetic Structure of the Prokaryotic Domain: The Primary Kingdoms.” Proceedings of the National Academy of Sciences 74 (11): 5088–90. doi:10.1073/pnas.74.11.5088.

  29. Wylie, Kristine M., George M. Weinstock, and Gregory A. Storch. 2012. “Emerging View of the Human Virome.” Translational Research 160 (4): 283–90. doi:10.1016/j.trsl.2012.03.006.

  30. Xue, Zhuyi. 2019. “ncbitax2lin: Convert NCBI Taxonomy Dump into Lineages.”

Appendix A

# R script to analyze PubMed abstracts
# Author: Dan Leonard

# Import pubmed analysis library
# (Rani, Ramachandran, and Shah 2019)
# Import file-reading library
# (Wickham, Hester, and Francois 2018)

# Load XML search result file from PubMed into list
abstracts <- xmlreadabs("pubmed_result.xml")

# Load NCBI lineage data into list
# (Xue 2019)
lineages <- read_csv("lineages-2017-03-17.csv",
	col_types = cols(
		.default = col_character(),
		tax_id = col_integer()

# Run PMIDs through PubTator
pubtators <- lapply(abstracts@PMID, pubtator_function)
# Remove [" No Data "] from PubTator results
pubtators <- pubtators [! pubtators %in% list(" No Data ")]

# Get list of vectors of species names
# Species names are located in column 5
species <- sapply(pubtators, "[[", 5)
# Remove nulls
species [sapply(species, is.null)] <- NULL
# Flatten
species <- unlist(species)

# Remove species names, leave NCBI numerical ID
species <- sapply(species, function(x) sapply(strsplit(x, ">"), "[[", 2))
# Convert to numeric form
species <- as.numeric(species)
# Remove references to Homo sapiens (NCBI ID 9606)
species.nohuman <- species [! species %in% 9606]

# Get domain names <-
	lineages$superkingdom[match(species.nohuman, lineages$tax_id)]
# Get phyla names
species.nohuman.phyla <-
	lineages$phylum[match(species.nohuman, lineages$tax_id)]

# Create data frame
phylogeny <- data.frame(
				match(species.nohuman, lineages$tax_id)
				match(species.nohuman, lineages$tax_id)
			lineages$`no rank`[
				match(species.nohuman, lineages$tax_id)
# Add additional column "Name"
phylogeny [ , c("Name")] <- NA
# Remove unhelpful term "cellular organisms"
phylogeny$Norank[phylogeny$Norank == "cellular organisms"] <- NA
# Use virus types as name if present
phylogeny$Name <- phylogeny$Norank
# Use phylum as name if present
phylogeny$Name[$Name)] <-
# Use "<domain> (nonspecific)" as name if previous two not present
phylogeny$Name[$Name)] <-
		sep=" "
# Remove phylum and virus type columns
phylogeny <-
# Use table() to count occurrences
phylogeny <-
# Remove extraneous values
phylogeny <- subset(phylogeny, Freq != 0)
# Sort
phylogeny <- phylogeny[order(phylogeny$Domain, -phylogeny$Freq),]

# Create dictionary for looking up colors
colors <-
		Colors=c("Red", "Green", "Yellow", "Blue"),
		Domains=c("Archaea", "Bacteria", "Eukaryota", "Viruses"),

# Create list of colors matching domains
cols <-
		match(phylogeny$Domain, colors$Domains)

# Set margins
# Print phylum plot
plot <- barplot(
	height = phylogeny$Freq,
	ylab = "log References",
	main = "References to specific phyla in PubMed search for \"Human Microbiome\"",
	col = cols,
	names.arg = phylogeny$Name,
	log = "y",
	xaxt = "n"
# Add X-axis labels
	labels = phylogeny$Name,
	srt = 45,
	adj = c(1.1,1.1),
	xpd = TRUE,
	cex = 0.8
# Add legend for domain colors
	legend = colors$Domains,
	fill = colors$Colors