41. Skip to content

41. DCAS GeoParquet

The Dynamic Crop Advisory System (DCAS) dataset provides daily crop advice, growth-stage monitoring, and climate-related metrics.
It is stored as partitioned GeoParquet files in the GAP bucket.


41.1 Prerequisites

Python 3.9+
DuckDB with extensions:

  • httpfs (S3/HTTP access)
  • spatial (GEOMETRY support)

Read-only GAP object storage credentials

41.1.1 Environment variables

export S3_ENDPOINT_URL="https://fra1.digitaloceanspaces.com"
export S3_ACCESS_KEY_ID="YOUR_READ_ONLY_KEY"
export S3_SECRET_ACCESS_KEY="YOUR_READ_ONLY_SECRET"
export S3_BUCKET_NAME="bucket-name"

41.2 Path layout

DCAS is partitioned by country (iso_a3), year, month, and day:

s3://$S3_BUCKET_NAME/staging/dcas_output/iso_a3=KEN/year=/month=/day=*/*.parquet

Example:

s3://gap-products/staging/dcas_output/iso_a3=KEN/year=2025/month=9/day=10/*.parquet

Always pass hive_partitioning=true to make iso_a3, year, month, and day available as columns.

41.3 Column descriptions

The following columns are available in DCAS GeoParquet.

Column name Type Description
planting_date_epoch UINTEGER Epoch timestamp of planting date for the farm.
crop_id USMALLINT Numeric identifier of the crop type.
crop_stage_type_id USMALLINT Identifier for the type of growth stage.
group_id UINTEGER Internal group identifier.
farm_id BIGINT Internal numeric farm identifier.
farm_unique_id VARCHAR External farm identifier (used for joining/filtering).
geometry GEOMETRY Spatial geometry of the farm/grid (polygon or point).
grid_id UINTEGER Numeric identifier for grid cell.
grid_unique_id VARCHAR Unique identifier for grid cell.
registry_id BIGINT Identifier from farm registry.
crop VARCHAR Crop name (e.g. “maize”).
country_id UINTEGER Country identifier (numeric).
grid_crop_key VARCHAR Composite key linking grid and crop.
county VARCHAR County name (administrative boundary).
subcounty VARCHAR Subcounty name (administrative boundary).
ward VARCHAR Ward name (administrative boundary).
preferred_language VARCHAR Preferred language for communication/advisories.
date TIMESTAMP Record timestamp.
prev_growth_stage_id USMALLINT Previous growth stage identifier.
prev_growth_stage_start_date UINTEGER Epoch timestamp when previous growth stage started.
config_id USMALLINT Identifier for model/config version used.
growth_stage_start_date UINTEGER Epoch timestamp when current growth stage started.
growth_stage_id USMALLINT Identifier for current growth stage.
total_gdd DOUBLE Accumulated Growing Degree Days (GDD).
seasonal_precipitation DOUBLE Total precipitation recorded in current season (mm).
temperature DOUBLE Mean temperature for record period (°C).
humidity DOUBLE Mean relative humidity (%) for record period.
p_pet DOUBLE Precipitation / Potential Evapotranspiration ratio.
growth_stage_precipitation DOUBLE Precipitation during current growth stage (mm).
message UINTEGER Encoded advisory message ID.
message_2 UINTEGER Secondary message code (if available).
message_3 UINTEGER Tertiary message code.
message_4 UINTEGER Quaternary message code.
message_5 UINTEGER Quinary message code.
is_empty_message VARCHAR Flag if message is empty (text flag).
has_repetitive_message VARCHAR Flag if advisory is repetitive (text flag).
final_message UINTEGER Final consolidated advisory message ID.
prev_week_message UINTEGER Advisory message from the previous week.
growth_stage VARCHAR Human-readable crop growth stage name.
__null_dask_index__ BIGINT Index column generated by Dask (ignore in most queries).
day BIGINT Day partition (from file path).
iso_a3 VARCHAR ISO Alpha-3 country code (e.g. KEN).
month BIGINT Month partition (from file path).
year BIGINT Year partition (from file path).

41.4 Example queries

41.4.1 Stage a subset

import os, duckdb

conn = duckdb.connect()
conn.install_extension("httpfs"); conn.load_extension("httpfs")
conn.install_extension("spatial"); conn.load_extension("spatial")

dcas_path = (
  f"s3://{os.environ['S3_BUCKET_NAME']}/"
  "staging/dcas_output/iso_a3=KEN/year=*/month=*/day=*/*.parquet"
)

conn.sql(f"""
CREATE OR REPLACE TABLE dcas AS
SELECT *
FROM read_parquet('{dcas_path}', hive_partitioning=true)
WHERE year=2025 AND month=9 AND day=10
""")

41.4.2 Quick checks

-- Count rows
SELECT COUNT(*) FROM dcas;

-- Schema overview
DESCRIBE dcas;

41.4.3 Distinct messages

SELECT DISTINCT final_message FROM dcas;

41.4.4 Messages distribution

SELECT final_message, COUNT(*) AS n
FROM dcas
GROUP BY final_message
ORDER BY n DESC;

41.4.5 Per-farm history

SELECT date, farm_unique_id, crop, growth_stage, final_message
FROM dcas
WHERE farm_unique_id = '4023361'
ORDER BY date DESC
LIMIT 20;