Connecting fish data for open and sustainable science

by Manipal Systems
FishBase, the popular database of fish species

This article was featured in Eurofish Magazine 6 2025.

Advertisements

The world’s most widely used database on fish species, FishBase, is set to become even more useful thanks to the SemantyFish project that will integrate it with other databases to allow complex queries on different fish-related topics.

FishBase, the world’s largest and most widely used database on fish species, is a cornerstone resource. Despite its importance, it cannot fully support the demands of modern, data-driven science and policy in its current form. The SemantyFish project seeks to transform FishBase from a static collection of database tables, into a dynamic, machine-readable and interoperable knowledge graph. This approach enables data to be linked, queried, and integrated with global research infrastructures, opening new possibilities for understanding fish biodiversity, fisheries sustainability, and aquaculture development. The transformation aligns with global moves and practices towards open science and FAIR (Findable, Accessible, Interoperable, Reusable) data principles.

What is a knowledge graph and why it matters?

Think of a knowledge graph as a well-organised fishing net. Each knot in the net represents a piece of information, such as fish species, its habitat, its biology, etc. The hundreds of connected threads show how these pieces of information are related. Instead of having isolated bits of data scattered in separate databases, a knowledge graph weaves everything together, making it easier to identify relations, patterns and trends, across the entire interconnected network. In fisheries science, this approach opens new possibilities. Traditional databases can store huge amounts of information, but they often struggle to answer complex questions that span species, ecosystems, biological characteristics and human activities. With a knowledge graph, data from FishBase can be linked to other global sources, such as biodiversity records, environmental monitoring data, stocks and fisheries assessments. This allows researchers to ask questions like “which fish species in the North Atlantic are both commercially important and most sensitive to temperature changes?”. 

To support turning FishBase into a knowledge graph requires much more than simply reorganizing and changing the structure of the data. It requires transforming the way information is structured, connected and shared. At the heart of SemantyFish lies a top-level ontology used as the conceptual model, able to model the semantics of FishBase resources in a non-ambiguous manner. Through the ontology, it is possible to capture relationships between species, habitats, ecological interactions, bibliography, etc. Practically, it ensures that fish species information is no longer isolated, anybody can trace its connections to other species, ecosystems, and habitats.

The main outcomes of SemantyFish

The first and most visible outcome of the project is the knowledge graph itself. The knowledge graph organizes species data, biological characteristics, ecological traits, distribution information, and related references in a way that is both human- and machine-readable. This interconnected structure allows researchers to explore complex relationships, uncover patterns across species and ecosystems, perform queries that were previously cumbersome or impossible, and come up with new facts and knowledge about them. Moreover, through an evolution workflow, the knowledge graph remains continuously aligned with updates from FishBase, ensuring that it reflects the most current biodiversity information. In essence, the static database tables of information become a dynamic fisheries knowledge base.

The second outcome is a programmatic API (Application Programming Interface), that allows developers to discover and access the contents of the knowledge graph. Through the API, users can retrieve information on various types of entities, integrating it into their own research workflows, applications, or decision support tools. By providing standardised, automated access, the API ensures that SemantyFish is not a static resource, but a living infrastructure that can be widely used, shared, and connected with other services, platforms and infrastructure.

New possibilities for research, management, and innovation

Smarter fisheries management: With SemantyFish, managers can run queries that combine biological, ecological, and distribution data, helping them identify vulnerable species, track shifting populations, or assess the sustainability of particular fisheries. For example, it becomes possible to quickly identify which commercially important species are more at risk from climate-driven habitat changes.

Advancing aquaculture research: Developers and researchers benefit from reliable access to species traits, growth conditions, habitat details, and ecological requirements. By linking this information with environmental data, SemantyFish can support the design of more sustainable farming practices. For example, users could get answers about freshwater species that have traits suitable for low-input farming systems in tropical regions.

Enabling interdisciplinary research: Since the knowledge graph is designed for integration, it can connect FishBase data with broader or related scientific resources, from global biodiversity platforms to climate databases, and food-related data sources. This allows new kinds of interdisciplinary studies, linking fisheries biology with oceanography, climatology, or nutrition research.

Education and capacity building: SemantyFish offers students, educators, and early-career scientists access to a flexible tool that makes complex data easier to explore and understand. 

Fostering open-science and FAIR data principles: The SemantyFish knowledge graph can integrate with biodiversity portals like GBIF, taxonomic registries such as WoRMS, marine observation platforms like OBIS, or stocks and fisheries assessments like the Global Record of Stocks and Fisheries (GRSF). This enables holistic analyses of fisheries data that were previously difficult to achieve.

Promoting long-term use: The availability of an API ensures that SemantyFish can be plugged into a wide range of research tools, policy platforms, and decision-support systems. Whether used by scientists modelling the impact of warming seas, policymakers drafting regional fisheries plans, or educators developing training materials, the API is designed for broad, long-term use and collaboration.

Towards smarter fisheries knowledge

As challenges such as climate change, biodiversity loss, and food security grow more pressing, the need for reliable and interoperable fisheries data has never been greater. SemantyFish provides a foundation for meeting this need, offering a resource that will continue to evolve alongside global research infrastructures and community priorities. In doing so, the project affirms a simple but powerful idea: when data becomes knowledge, and knowledge is shared openly, we all gain the tools to build a more sustainable future for fisheries and aquaculture.

SemantyFish, is an open science project funded under the OSCARS 1st Call for Open Science Projects and Services. Start date: October 2024, End date: September 2026. semantyfish.github.io

You may also like