Ecology of Data

In his landmark publication “Animal Ecology, Charles Elton points out that only after we organize knowledge across a “colossal store of facts” published in books and journals can we understand ecology beyond “clearcut niches which happened to have been worked out” (Fig 1). He wrote about this vision of natural-history fact synthesis as early as 1927. Since then, we’ve made some progress in providing access to observations of organisms recorded by generations of naturalists. For instance, Global Biodiversity Information Facility (GBIF) indexes over 600 million records of species occurrences (as of Feb 2016) from a wealth of data sources from all over the globe. GBIF’s assemblage includes digitized museum specimen collections and digital-native citizen science projects such as iSpot, eBird, and iNaturalist. However, major “shortfalls” exist, especially related to species interactions (“Eltonian shortfall,” Hortal et al. 2015).


Fig 2. GloBI sits in a web of interconnected online services, including evidence  literature (red), taxonomies (yellow), ontologies (purple), data providers (green) and search indexes/aggregators (pale blue).

Now that we have a wealth of data describing where organisms hang out, a next logical step is to increase focus on records that describe how species rely on each other. Global Biotic Interactions (GloBI) takes a federated approach to integrating evidence data: rather than serving as a giant data store, GloBI continuously discovers and links facts about species interactions across existing ecological data providers, online taxonomies, citation indexes, and data applications. In a way, GloBI is an actor in a web of mutualistic relationships between journals, online data tools, and citizen-science and research communities (Fig 2). With its finely tuned mission of making interaction data more accessible, GloBI acts as a catalyst for connecting previously siloed communities through the data they share. This linked data makes it easier to discover evidence about specific species interactions (e.g., What do sea otters (Enhydra lutris) eat?) and forges cross-disciplinary collaborations, such as data peer reviews and food-web reconstructions from previously disjointed evidence data (see Poisot et al. 2015).


Fig 3. Linkages related to an iNaturalist observation describing a Great White Shark (Carcharondon carcharias) eating a California Sea Lion (Zhalophus californianus).

By linking communities through their shared data (Fig 3), focused projects like GloBI can help to improve the resilience, quality, and diversity of the data ecosystem that exists today. And by building an increasingly diverse and healthy data ecosystem, we continue to work toward realizing Charles Elton’s vision and better our understanding of ecology at a global scale.


Pollinating Nerds

A project pitch at Nerds for Nature Summit 2015.

On Saturday 17 October 2015, a herd of nature nerds gathered for the Nerds for Nature Summit 2015 at Manylabs in San Francisco, California. The format of the summit was a scaled up version of the collaborative Nerds for Nature meetup format: a round of introduction, short project pitches followed by breakout project specific discussion groups. At this years summit, Global Biotic Interactions was featured as one of the one-minute pitches.

1 minute GloBI pitch

GloBI slide presented at summit

Nerds for Nature and Manylabs facilitate invaluable interactions between folks from various disciplines (e.g. software, hardware, education, design) to develop and mature projects centered around open data, and open (citizen) science. One of the many outcomes my (and therefore GloBI’s) connection to Nerds for Nature and Manylabs was getting access to a stimulating work environment at Manylabs, planting the seeds for a project called EDAM (Ecological Data Access Monitor) and an increased cross-pollination of open science/data ideas with individuals and institutions (e.g. Manylabs residents and their guests, Concord Consortium, Hacking Measurement at UC Berkeley, Moore Foundation).

Going forward, Global Biotic Interactions will continue to foster mutualistic relationships within a growing, diverse and social open data/education/science community.

Big thanks to Nerds for Nature for organizing the summit and to Manylabs for hosting the

Massive Open Online Data

GloBI Exporer See-It-All Poster

Baron, Daniela; Caragol, Ri; Furrer, Stefan; Macmurchy, Peter; Stark, Adam (2015): GloBI Explorer Interactive Ecosystem Explorer. figshare. Retrieved on May 23, 2015.

MOOCs are all the rage. An enormous amount of courses are now available online for anyone who has time and a networked device. When Battushig Myanganbayar, a 15-year-old high school student from Mongolia, aced MIT’s Circuits and Electronics MOOC, he found himself in the spotlight and got invited to study at the prestigious institution.

The folks at Indiana University take the idea to the next level: the Information Visualization MOOC organized by Katy Börner et al. not only provides free education, it also gets students to collaborate with real projects that give access to open data. I was excited that our project, Global Biotic Interactions (GloBI), was invited to participate in this unique course for a second year (see IVMOOC 2014 project).

GloBI Explorer Paper

Baron, Daniela; Caragol, Ri; Furrer, Stefan; Macmurchy, Peter; Stark, Adam (2015): GloBI Explorer: Interactive Ecosystem Explorer. figshare. Retrieved May 22, 2015.

The IVMOOC-GloBI challenge for this year was to create an engaging experience for high school students to explore food webs in and outside of the classroom. From the start, Daniela Baron, Ri Caragol, Stefan Furrer, Peter MacMurchy, and Adam Stark were eager to learn more about the dataset, provide improvement suggestions, and respond to feedback provided by Jeff Holmes, Marie Studer, and Jen Hammock of the Encyclopedia of Life. I was impressed by what they were able to create in only a short amount of time: a web application, a paper, and show-it-all poster.

GloBI Explorer Screenshot

GloBI Explorer Screenshot. Retrieved on May 22, 2015.

I think that this year’s IVMOOC project demonstrates the benefits of open data: the use of openly accessible data provides the ability for anyone with an idea and an internet connection to help us better understand the world around us. Not only that, it helps to create Massive Open Online Data (MOOD) communities of citizen scientists and engineers from all over the world to help make the data, and the tools used to access it, increasingly more useful. For example, Sergey Slyusarev, an IVMOOC 2014 alumnus, has identified data issues and is coauthor of rglobi, an R library for accessing GloBI.

Thanks to the IVMOOC class of 2015 (and their organizers) for making this happen!

Tea at Berkeley Institute for Data Science

Doe Memorial Library at UC Berkeley

On Feb 5, 2015, Global Biotic Interactions (GloBI) was topic of an afternoon Tea talk at Berkeley Institute for Data Science (BIDS). Located in the historic Doe Memorial Library, BIDS helps to advance data-intensive science across the UC Berkeley campus. Part of their many activities is a twice-a-week afternoon tea series where projects are presented and discussed.

Berkeley Institute for Data Science

After a short introduction by BIDS fellow Falk Schuetzenmeister, I introduced GloBI to the 20-30 audience members. Following, the more interesting part of the afternoon started: an open discussion! I learned a few things from the discussion. First is that the audience considered the immediate utility of GloBI to be facilitating data discovery and literature research to help come up with original research questions. In line with this topic, Iryna Dronova suggested to create a real-time data source tracker to help visualize which data sources are available through GloBI.

One of the other discussion topics was how to stimulate ecologist to share data. David Ackerly mentioned that a critical part of the success of GenBank was that editorial boards of genomics journal collaborated and mandated that data be deposited in this public resource prior to accepting manuscripts for publication. Also, in-person, multi-day workshops were mentioned as promising method to get cross-disciplinary researchers together, share data and use new tools to help answer meaningful research questions.

GloBI slides presented at BIDS Tea

Sea otters and their lunch were a topic of discussion at BIDS Tea Feb 5, 2015 following a presentation.

Finally, an audience member asked a question: ‘Do otters really eat beavers?’. This told me that my short demo using available GloBI tools assisted in data review – dubious data was identified quickly during a short data excursion in the world of species interactions. My answer to this question was: Please lookup the source reference and ask the data contributors. After a visit to the EOL Enhydra data tab, I discovered that this interaction was recorded by Joel Sachs et al. (2006).

Now the question remains: Do sea otters (Enhydra) really eat American beavers (Castor canadensis)? Are sea otters that vicious? I hope we’ll find out… I’ll make a point to share this blog post with the authors, in the hope that they can shed some light on the topic.

Many thanks to Falk Schuetzenmeister, Ali Ferguson and the engaged audience for a stimulating afternoon at BIDS.

Update Feb 19, 2015 – After an open discussion with data contributors, the trophic interaction between Sea Otters (Enhydra) and American beavers (Castor canadensis) has been removed. The changes will propagate into GloBI and EOL with some delay.

Eating Pudding

Poelen et al., 2014

Jorrit H. Poelen, James D. Simons and Chris J. Mungall. (2014). Global Biotic Interactions: An open infrastructure to share and analyze species-interaction datasets. Ecological Informatics. .

“The proof of the pudding is in the eating”, is a phrase that stuck out in detailed comments from Jan Willem Henfling on our recent paper (Poelen et al., 2014) in Ecological Informatics. With this, he pointed out that it is important to get the species interaction data into the hands of researchers and educators.

I was happy to read his comments, because it told me that our investment in writing and publishing an open-access paper (at a seemingly hefty price of $2500) is starting to pay off. Also, it highlighted that getting the interaction data out there for anyone to use is not enough: active collaborations are essential to show the use of our project. This is why I wanted to share some recent activities with you.

NESCent-EOL-BHL Research Sprint Feb. 4-7, 2014 (3) copy

Participants of NESCent-BHL-EOL Research Sprint on 4-7 February 2014 in Durham, North Carolina. Can you find the author?

After participating in the 4 day research sprint organized by NESCent, Biodiversity Heritage Library and Encyclopedia of Life at Durham, North Carolina in February 2014, I have been working with Brian Hayden to use GloBI data to show how dietary niche relates to biodiversity around the globe. Preliminary results are encouraging and a manuscript is in the works. Also, I have continued to work with Jen Hammock (Encyclopedia of Life, Smithsonian Institution) and Jim Simons (Gulf of Mexico Species Interactions, Texas A&M Corpus Christi) to put GloBI data to public use.

Tree-for-All hackathon participants gather to hear a progress report.

In September 2014, I participated in the week long Tree-for-All hackathon hosted at the University of Michigan and organized by Arbor Workflows and Open Tree of Life. Among many other things, this collaborative event helped create a method to retrieve phylogenetic trees related to species interactions (e.g. Pocket Gophers and Their Parasitic Chewing Lice) using rglobi (part of rOpenSci) and rotl R libraries.

In the time to come, I am looking forward to continue to help others eat (or make!) more of that delicious GloBI data pudding! Pudding anyone?

A Food-Web Map of the World

This is a caption

A spatially integrated food web of the world derived from hundreds of thousands of interactions, across tens of thousands of species, and thousands of locations.

Sergey Slyusarev, Dimitrios-Georgios Kontopoulos, William Taysom, Adrian Guzman, and Bimlesh Wadhwa used GloBI data to create a food-web map [1] as part of the Information Visualization MOOC class of 2014 at Indiana University. The map was created by combining interaction data from GloBI’s Darwin Core Archive with terrestrial and marine ecoregions of the world and various openly available taxonomies (e.g., ITIS, NCBI, WoRMS). After eliminating taxa with few recorded interactions, species with similar predator-prey characteristics were grouped by a custom algorithm that was inspired by the Jaccard index, a similarity measure, and based on Infomap, a community-detection algorithm. The resulting interconnected taxa communities were then used to make an information-packed (gorgeous!) food-web visualization. The map was generated with a combination of custom R scripts, existing libraries (e.g., igraph, Reol, rgdal), Cytoscape, and Adobe Illustrator.

This is the caption

Explanation of how color, line width, and node size are used to encode spatial food-web information.

This is a caption

Color encoding of ecoregions around the world, plotted with interaction locations.

I find the integration of spatial information (e.g., marine, terrestrial) in this graph useful because I can quickly relate specific interactions to regions in the world. For instance, I can easily spot a coastal interaction as a filled node that also has a colored border. In addition, the directionality of the interactions are easy to understand thanks to color coding: predator is orange, prey is blue. Opening the high-resolution image in a run-of-the-mill image viewer, I can easily browse the map by zooming and moving with touch-pad gestures. With the help of this visualization, data anomalies in GloBI’s complex data collection were detected, reported through GloBI’s issue list (see here, here, here, and here), and corrected. This alone tells me that the visualization by Slyusarev et al. is a useful research tool.

Special thanks to all GloBI data contributors, Sergey for his suggestions for improving GloBI, and Scott Weingart of Indiana University for inviting GloBI as a client project of IVMOOC 2014. Can’t wait to work with the IVMOOC class of 2015!

[1] Slyusarev, Sergey; Kontopoulos, Dimitrios-Georgios; Taysom, William; Guzman, Adrian; Wadhwa, Bimlesh (2015): Global Biotic Interactions food web map. figshare. Retrieved 03:26, Feb 07, 2015 (GMT)

Exploring Antarctic Interactions Using GloBI’s Interaction Browser


Area selection tool in the GloBI’s Interaction Browser provides access to raw data files in addition to a share link. The “show” link updates the visualizations in other parts of the page.

Rugged scientists frequently brave the elements to study who eats what in those frigid yet productive waters of the Southern Ocean. Earlier this year, Ben Raymond was kind enough to share the Southern Ocean diet database that he developed with colleagues (Raymond et al. 2011) with GloBI. Having great data is one thing but . . . being able to (easily) explore the data is a challenge by itself. Enter Göran Bodenschatz, an enthusiastic, passionate web developer. Göran unleashed his skills to create a first pass at the GloBI’s Interaction Browser using d3, a javascript visualization library, in combination with the GloBI API. His html/ javascript source code is available here.

With Ben’s data and Göran’s tool, we can now “dial-up food webs” (phrase coined by Peter Roopnarine)  all across the Antarctic and discover that many species feast on Eurythenes gryllus and its cousin Eurythenes obesus. Not only are the interactions visualized on-the-fly using a dependency wheels, you can also access the raw csv, json or dot files to do offline analysis. In addition, you can share the selected area with others using a provided Interaction Browser “share” link.

This particular experience tells me that simply collecting and aggregating data is not enough. Only after locating and illuminating data with search and visualization tools,  I can start to analyze and perhaps understand the biological mechanisms behind the data hidden inside GloBI. . .


Screenshots of circular diagrams that highlight predatory interactions for Eurythenes gryllus around the Antarctic peninsula. The left diagram indicates the number of interactions by the width of the arc on the outside of the circle. The right diagram bundles the interactions to help detect highly interacting taxa. In the right diagram red indicates incoming interactions of selected taxon (e.g. prey), whereas green indicates outgoing interactions (e.g. predator).