37. Skip to content

37. Preparing and checking an Occurrence Data File before uploading

37.1 Extracting data and preparing bulk data for uploading into FBIS

FBIS includes data capture forms for adding sites, fish, invertebrate and algae data, as well as associated abiotic data. However, these forms are intended for the capture of individual site visits, and at times, especially during the development phase of an information system, it is useful to be able to upload large amounts of data at a time. To ensure that bulk data are readily uploaded into FBIS, a standardised Data Upload Template needs to be produced for each group. This will ensure that data are cleaned and provided in a standardised manner so that the data uploading process runs smoothly and so that the resultant FBIS platform serves quality data.

37.2 Data Upload Templates

Standardised data upload templates in excel have been generated for four biodiversity groups, namely anurans, fish, invertebrates and algae. Additional upload templates may be added as new biodiversity modules are added. These templates include a number of dropdown attributes. It is critical that all taxa in the Data Upload file for a particular biodiversity group are present in the Master List of Taxa for that group. If not, the upload process will not work. For this reason, it is recommended that where possible dropdown lists are used in the Data Upload files to ensure that all data entered in the sheet are valid. As an example, the Master List of fish species in South Africa has been included as dropdown list for the FBIS Fish Data Upload Template.

The following data upload templates are provided:

  • FBIS Fish Data Upload Template.xlsx
  • FBIS Algae Data Upload Template.xlsx
  • FBIS Invertebrate Data Upload Template.xlsx
  • FBIS Anuran Data Upload Template.xlsx

Many of the columns are the same for each group, although some additional columns are included for algae, and the dropdown options (e.g. biotopes) sometimes vary amongst groups. The common columns used in each group are given in Table 1. Relevant dropdown lists are provided in each template and additional group-specific columns are included in the relevant templates. Those column headers in red have to be filled in the occurrence upload file. See section on Check the Source Reference for details of which columns are relevant for different reference categories. Those column headers shaded green use dropdown lists in the excel file. These can be modified in the Admin page (BIMS).

Table 1. Column headers with details used in data upload templates

Column Header Details
UUID A unique identifier for each record. Drag and copy the formula down.
Original River or Wetland Name Name given in study reference
Original Site Code Site Code given in study reference
Site Code Leave blank (autogenerated)
Site description Description given in study reference
Refined Geomorphological Zone As given in study reference (if available) - Select from dropdown options
Latitude Decimal degrees with "-" in front & "," for decimal point
Longitude Decimal degrees with "," for decimal point
Sampling Date yyyy/mm/dd
Kingdom Not necessary to fill in as they are derived from the Master Lists
Phylum Not necessary to fill in as they are derived from the Master Lists
Class Not necessary to fill in as they are derived from the Master Lists
Order Not necessary to fill in as they are derived from the Master Lists
Family Not necessary to fill in as they are derived from the Master Lists
Genus Not necessary to fill in as they are derived from the Master Lists
Species Not necessary to fill in as they are derived from the Master Lists
Taxon
Taxon rank Select from dropdown options
Present Fill in with "1"
Sampling method Select from dropdown options
Sampling effort measure Select from dropdown options
Sampling effort value
Abundance measure Select from dropdown options
Abundance value Leave blank if only presence data
Record type Select from dropdown options
Broad biotope Select from dropdown options
Specific biotope Select from dropdown options
Substratum Select from dropdown options
Water Level Select from dropdown options
Water Turbidity Select from dropdown options
Embeddedness Select from dropdown options
Depth Depth (m)
Near Bed Velocity Near Bed Velocity (m/s)
COND Conductivity (mS/cm)
PH PH
DOPER Dissolved Oxygen (%)
DO Dissolved Oxygen (mg/L)
TEMP Temperature (deg C)
TURB Turbidity (NTU)
Collector/Owner Full name
Collector/Owner Institute Institute of collector/owner
Author(s) Surname + Initials
Year Year of study
Source Source of data if from thesis or database
Title Title of data source if thesis or published report
Reference category Select from dropdown options
URL Thesis handle, or article link if no DOI available
DOI DOI for reference ( e.g. 10.2989/16085914.2018.1491385)
Document Upload Link Link to "Documents" page on FIBbio after uploading PDF of published report
Notes Any details related to sampling, species, life form etc.

It is recommended that separate data upload files be created for each Source Reference, with each linked to a GitHub ticket. This ensures accurate management of data preparation and management.

37.3 Preparing and checking an Occurrence Data File

To ensure that data are accurate, several steps should be taken before uploading occurrence data. After consolidating the occurrence data in the data file, you should check the following:

Apply filters for checking the data by highlighting the header row, clicking Data, Filter

Occurrence Data Preparation 1

UUID: This is a unique id for each occurrence record. It needs to be copied and pasted so that the formula used to generate it is saved as a number.

The UUID formula is available here:

=LOWER(CONCATENATE(DEC2HEX(RANDBETWEEN(0,POWER(16,8)),8),"-",DEC2HEX(RANDBETWEEN(0,POWER(16,4)),4),"-","4",
DEC2HEX(RANDBETWEEN(0,POWER(16,3)),3),"-",DEC2HEX(RANDBETWEEN(8,11)),DEC2HEX(RANDBETWEEN(0,POWER(16,3)),3),
"-",DEC2HEX(RANDBETWEEN(0,POWER(16,8)),8),DEC2HEX(RANDBETWEEN(0,POWER(16,4)),4)))

Systematically check each column using the dropdown arrows, and look for inconsistencies. Some common issues include, #num in UUID column instead of the UUID, incorrect spelling in the Site description column (e.g. Gakiriro wetland, Gakirirowetland), latitude with missing “-“ (e.g. 2.60059 as latitude is incorrect – should be -2.60059), longitude.

Also check that all sites fall within the country boundary so that Site Codes may be generated correctly and geocontext data harvested for each site.

Occurrence Data Preparation 2 Occurrence Data Preparation 3

Check that the sampling date is in the correct format: yyyy/mm/dd. If the format is not correct then the upload will not work.

Check that all taxa are correct and are present in the Master List. If the dropdown of master taxa list was used then this should not be an issue. Check that the Taxon rank is correct. It is important that the correct Taxon Rank is always used to ensure correct uploading of the data files. Taxon Rank is case sensitive so Species will upload but species will fail. Always ensure the correct Taxon Rank is applied by using the dropdown list. There should be no spaces in SubClass, SubOrder, SubFamily, SubSpecies.

Occurrence Data Preparation 4

Check presence is all “1”, check sampling method is correct.

Occurrence Data Preparation 5 Occurrence Data Preparation 6

Check Collector/Owner and Collector/Owner Institute. Ideally CAPITALS should not be used, First name Surname if known. Do not use middle initial and punctuation.

Occurrence Data Preparation 7

Check the Source Reference (Author(s), Year, Source, Title, Reference category, URL, DOI, Document Upload Link). For each study reference type, you need to populate the following columns:

  • Peer-reviewed scientific article (Collector/Owner; Collector/Owner Institute; Author(s); Year; Source; Title; DOI or URL (if DOI is not available)). For Peer-reviewed scientific article the Source is the Journal, For Peer-reviewed scientific article the Title is the title of the article.
  • Published report (Collector/Owner; Collector/Owner Institute; Author(s); Year; Source; Title; URL or Document Upload Link). Note the Document Upload Link is obtained after the report is added.
  • Thesis (Collector/Owner; Collector/Owner Institute; Author(s); Year; Source; Title; URL or Document Upload Link)
  • Database (Collector/Owner; Collector/Owner Institute; Author(s); Year; Source)
  • Unpublished data (Collector/Owner; Collector/Owner Institute; Author(s); Year; Source)

Check format of Author(s). It needs to be: Surname + Initials, no punctuation. (e.g. Tumushimire L, Mindje M, Sinsch U & Dehling JM not Lambert Tumushimire, Mapendo MINDJE, Prof. Ulrich Sinsch & Julian Maxmillian Dehling). It is important to get the authors correct (e.g. Sinsch Ulrich and Dehling, J. Maximilian, Lümkemann Katrin, Rosar Katharina, Christiane Schwarz should be Sinsch U, Lümkemann K, Rosar K, Schwarz C & Dehling M as per the doi).

Occurrence Data Preparation 8

Check the Date: This is the publication date (so 2012-2013 should be 2019 as this is when the article was published - Ecology and Evolution. 2019. Same with all other data from this study).

Occurrence Data Preparation 9

Check the Source. Please note when to include source or not, and what to include. (e.g. Mindje, M., Tumushimire, L., & Sinsch, U. (2020). Diversity assessment of anurans in the Mugesera wetland (eastern Rwanda): impact of habitat disturbance and partial recovery. Salamandra, 56, 27-38. Should be Salamandra)

  • For Peer-reviewed scientific articles - the Source is the Journal.
  • For Published Reports and Theses - the Source is the publisher of the Report.
  • For Unpublished Data - the source is the title of the study.

Occurrence Data Preparation 10

Check the Title. For Peer-reviewed scientific article the Title is the title of the article, for Published reports or theses, it is the title of the thesis. Unpublished data don’t need a title.

Occurrence Data Preparation 11

Check all Reference Categories are correct: options include:

  • Database
  • Peer-reviewed scientific article
  • Published report
  • Thesis
  • Unpublished data

Occurrence Data Preparation 12

Check URL and DOI. Use a DOI if it is available, URL – only needed for Peer-reviewed scientific article if there is no DOI. For the DOI you only need to include the number part, so 10.1080/15627020.2012.11407524, not https://doi.org/10.1080/15627020.2012.11407524.

Occurrence Data Preparation 13 Occurrence Data Preparation 14

Check the document upload link is correct. Note the Document Upload Link is obtained after the report is added. Reports are only uploaded when there is no DOI or URL to link the data to.

Checking for duplicate occurrence records. Use this formula for checking for duplicates. This is a combination of Site description, latitude, longitude, sampling date, Taxon, sampling method, author, year, source and title. Copy and paste the formula below into a new column at the end and name it “Duplicate check”.

=CONCATENATE(E2,G2,H2,I2,Q2,T2,W2,X2,Y2,Z2)

Then copy and paste the formula down to the end of the data rows. Then Highlight the column, and from the Home menu, select Conditional Formatting, Highlight Cells Rules, Duplicate Values.

Occurrence Data Preparation 15

Any duplicates will be highlighted. Check and delete duplicate occurrence records. Then delete the Duplicate Check column.

Delete blank rows and columns. Lastly, ensure that there are no extra blank rows or columns, by deleting them.

Occurrence Data Preparation 16

Occurrence Data Preparation 17

Remove the data filter, save the file in excel, and save the file as csv file.

Occurrence Data Preparation 18