49. Preparing and checking a Master List of Taxa before uploading¶
A taxonomic Master List is a list of all species and/or taxa within a particular group such as birds, fish, invertebrates, wetland plants, algae, etc. This section highlights issues and specific checks to improve accuracy of the Master List. The format of the Master List is important to ensure consistency for ingestion of data into the information system.
Only registered users with super user status are able to do this, typically the administrators.
49.1 Creating a Master List¶
A taxonomic Master List is a list of all species and/or taxa within a particular group such as birds, fish, invertebrates, wetland plants, algae, etc. For some groups a species list is easy to produce as species level is commonly identified in studies (e.g. birds, fish). For other groups, the taxonomic level (family, genus, species etc) varies considerably from study to study, and thus it is recommended that the lowest taxonomic level is used and that Taxon is used in preference to Species (e.g. invertebrates, algae).
The purpose of the Master List is threefold:
- To provide a comprehensive and up to date list of species/taxa for a specific group in a specific region. This needs to be done during the initial development of an information system such as FBIS, but once the system is up and running, then the further updating of the Taxonomic backbone is done using GBIF and user-defined taxonomic uploads.
- To facilitate downloading of data from the Global Biodiversity Information Facility’s (GBIF), thereby ensuring that the correct taxa are included on the information system.
- To provide the taxonomic hierarchy for taxa not yet on GBIF.
The generation of a Master List requires consultation with available resources, relevant publications and experts. A Master List is intended to be an updatable resource, improved and added to as new data and studies are published, or new taxa are described. If no species lists are available for a country then the GBIF Taxonomic Master List may be generated by extracting data from GBIF. This Master List should then ideally be checked and validated for accuracy by the Admin team.
The format of the Master List is important to ensure consistency for ingestion of data into the information system. The following columns are included in the Master Lists, provided as excel file template that will be used for each FBIS group (Master List Template Generic.xlsx). It is recommended that all compulsory columns be populated (given in black text), while others are optional (given in blue text). Explanations are given in parenthesis:
- On GBIF (Yes or No if the taxon is on GBIF)
- GBIF link (link to GBIF taxon)
- Taxon Rank
- Kingdom
- Phylum
- Class
- SubClass
- Order
- Family
- SubFamily
- Genus
- Species
- SubSpecies/Variety/Forma
- Taxon
- Taxonomic status (Accepted, Synonym, Doubtful)
- Accepted Taxon
- Scientific name and authority
- Common name (this is derived automatically from GBIF, admin can override if desired)
- Origin (Native, Non-native, Unknown). If a non-native taxon is invasive, then the Invasion Category may be added instead of Non-native
- Invasion (Category 1a invasive, Category 1b invasive, Category 2 invasive, Category 3 invasive). For example, in South Africa, there are different categories for invasive species. These categories are managed in a separate BIMS table: Invasions.
- Endemism (Endemism categories):
- Micro-endemic level 2 (Endemic to a single river or wetland)
- Micro-endemic level 1 (Endemic to less than 5 rivers or wetlands)
- Regional endemic level 2 (Endemic to a single primary catchment)
- Regional endemic level 1 (Endemic to a single Freshwater Ecoregion (e.g. CFE), more than one primary catchment)
- National endemic (Endemic to South Africa, occurs in more than one Freshwater Ecoregion within SA)
- Subregional endemic (Endemic to southern Africa)
- Widespread (Occurs beyond southern Africa)
- Unknown (Endemism is unknown)
- Conservation status (Global) - The IUCN Red List of Threatened Species website (IUCN Red List, 2020)) classifies species into nine main categories based on their extinction risk. The IUCN category is derived automatically from the IUCN website.
- Extinct
- Extinct in the Wild
- Critically Endangered
- Endangered
- Vulnerable
- Near Threatened
- Least Concern
- Data Deficient
- Not Evaluated
- Conservation status (national) - These can be accommodated if they exist, for example, there is the red list specific to South Africa based on SANBI’s categorisation and classification. See http://redlist.sanbi.org/ and http://redlist.sanbi.org/redcat.php
A separate Master List of Species/Taxa needs to be created for each biodiversity group. The Master List is ideally created before the consolidation of data so that the correct GBIF Taxonomic Backbone (https://www.gbif.org/dataset/d7dddbf4-2cf0-4f39-9b2a-bb099caae36c) is used for the data consolidation files. The taxonomy from GBIF should be used when the taxon is on GBIF. The Admin team can check if the taxon is on GBIF using the following link: https://www.gbif.org/species/1 and insert the relevant species, genus, family etc. in the “Select a species” box.
Taxa that are not on GBIF may be included in a Master List. Unfortunately several taxa may be missing from GBIF which, while it is the best available, is not always 100% correct.
There is also another platform that is useful, the Freshwater Animal Diversity Assessment (FADA) Project (http://fada.biodiversity.be/). FADA is the taxonomic backbone for its Freshwater Biodiversity Data Portal. One is able to consult and download FADA data, although it is not always up to date. In the future, Admin will be able to link to FADA.
Important notes and common errors:
It is important that the correct Taxon Rank is always used to ensure correct uploading of the data. Taxon Rank is case sensitive so Species will upload but species will fail. Always ensure the correct Taxon Rank is applied by using the dropdown list. There should be no spaces in SubClass, SubOrder, SubFamily, SubSpecies.
The column On GBIF: If the taxon is on GBIF, this must be Yes, then it is not necessary to add the GBIF link (URL). However, it is recommended that the GBIF URL be added to ensure the correct taxon is added. However, if the taxon is not on GBIF, then this must be No. Always include the full taxonomic hierarchy for all taxa (Kingdom, Phylum, Class, Order, etc.).
Note: It is recommended that significant time and resources are used to generate and refine the master list for each group (birds, fish, invertebrates etc) as much as possible before proceeding with data collation. This is the list around-which all of the occurrence data will pivot: the more accurate it is at the start, the more time you save in the long run when collating the biodiversity data for those taxa.
49.2 Creating a master list using BIMS functionality: Harvest species (taxa)¶
An option to harvest all existing taxa from GBIF has been developed, whereby all existing taxa recorded on GBIF for a particular spatial area (e.g. country) are harvested from GBIF and used to create a master list for a particular biodiversity group. Here are the steps: * Open the Taxon Management page. * Click + to Add a new Module.
- Create the label, add the logo icon.
- In the new module, Add a Taxon that you will be using as a parent. For example, in the Mammals taxon group, first manually add the Class Mammalia taxon.
- Approve the new taxon
- Click "edit" on the taxon group, by clicking the pencil icon in the taxon group list.
-
A popup will be displayed. In the bottom field, you can see the GBIF taxonomy; choose the new taxon here. (Type the first few letters to get options).
-
Click "Save"
- To harvest GBIF species / Taxa go to Administration, Harvest Species and select the desired biodiversity module. Now, when you harvest taxa, it will harvest all the taxa related to the parent taxon. If there are more than one GBIF Taxonomy within a module, then repeat the process by adding a second new taxon and third taxon, etc. and harvesting taxa sequentially for each. E.g. Reptiles include three classes:
Once an initial master list is created, it can be downloaded as a csv using the Download button
Then the csv can be edited in excel and reuploaded with extra attributes as needed. See section below.
49.2.1 Adding additional attributes for a specific taxon group¶
It may be desirable to add attributes for specific taxon groups such as “Water dependence” (Highly dependent, Moderately dependent, Minimally dependent, Terrestrial). These additional attributes are assigned to each taxon during the uploading of the master lists as long as the additional attribute is added in Taxon Management before uploading.
This is done in the Edit Module form, "Add attribute +". The attribute needs to match the attribute column header in your Master List for uploading.
49.3 Checking a Master List for accuracy¶
To ensure the Master list is accurate, several steps should be taken before uploading taxonomic data. After consolidating the master list, you should check the following:
Apply filters for checking the data by highlighting the header row, clicking Data, Filter. All columns should be checked for consistencies and typos. Systematically work from column A to W. In particular, check consistency of the Taxon Rank and taxonomic hierarchy (Kingdom, Phylum, Class, Order, Family, Genus, Species, SubSpecies, Taxon).
It is important to check the GBIF taxonomy for accepted names and synonyms. For example, in the avian master list, Ardea alba - is the accepted name, whereas Casmerodius albus is the synonym. Preferably only accepted names should be included in the Master List of Taxa.
Taxa should be checked for duplicates by highlighting the Taxon column, and from the Home Menu, selecting Conditional Formatting, Highlight Cells Rules, Duplicate Values.
GBIF URLs should also be checked for duplicates by highlighting the GBIF URL column, and from the Home Menu, selecting Conditional Formatting, Highlight Cells Rules, Duplicate Values.
Note: All taxa can be updated after ingestion through the Taxon Management section.
Delete blank rows and columns. Lastly, ensure that there are no extra blank rows or columns, by deleting them.
Save the excel file as a csv using the following option: