40. SPW Tamsat GeoParquet¶
The SPW Tamsat dataset is a variant of the Standard Planting Window (SPW) product generated using TAMSAT rainfall estimates.
Data is stored as partitioned GeoParquet files in the GAP bucket.
40.1 Prerequisites¶
Same as SPW.
Python 3.9+
DuckDB with: httpfs and spatial extensions
Read-only object storage credentials (S3_ENDPOINT_URL, S3_ACCESS_KEY_ID, S3_SECRET_ACCESS_KEY, S3_BUCKET_NAME)
40.2 Path layout¶
Partitioned by year and month:
Example:
40.3 Column descriptions¶
Observed schema (from DESCRIBE SELECT * FROM read_parquet(... ) LIMIT 1):
| # | Column name | Type | Description |
|---|---|---|---|
| 0 | date |
DATE | Record date for the daily TAMSAT-SPW output. |
| 1 | farm_id |
BIGINT | Internal numeric farm identifier. |
| 2 | farm_unique_id |
VARCHAR | External farm identifier used for joins/filters. |
| 3 | country |
VARCHAR | Country name. |
| 4 | farm_group_id |
BIGINT | Internal group identifier. |
| 5 | farm_group |
VARCHAR | Farm group label (e.g., “KALRO 2025 A”). |
| 6 | grid_id |
BIGINT | Numeric grid identifier. |
| 7 | grid_unique_id |
VARCHAR | Unique grid identifier. |
| 8 | geometry |
GEOMETRY | Farm/grid geometry (polygon/point). |
| 9 | latitude |
DOUBLE | Latitude of farm/grid centroid. |
| 10 | longitude |
DOUBLE | Longitude of farm/grid centroid. |
| 11 | sm_25 |
DOUBLE | Soil moisture proxy/indicator at 25th percentile (TAMSAT-derived workflow). |
| 12 | sm_50 |
DOUBLE | Soil moisture proxy/indicator at 50th percentile. |
| 13 | sm_70 |
DOUBLE | Soil moisture proxy/indicator at 70th percentile. |
| 14 | spw_20 |
DOUBLE | SPW decision score at 20% threshold scenario. |
| 15 | spw_40 |
DOUBLE | SPW decision score at 40% threshold scenario. |
| 16 | spw_60 |
DOUBLE | SPW decision score at 60% threshold scenario. |
| 17 | pfc_user_probability |
DOUBLE | User/algorithm probability for PFC (Planting Feasibility Criteria). |
| 18 | wrsi_user_probability |
DOUBLE | User/algorithm probability for WRSI (Water Requirement Satisfaction Index). |
| 19 | pfc_user_decision |
DOUBLE | Decision value derived from PFC probability. |
| 20 | wrsi_user_decision |
DOUBLE | Decision value derived from WRSI probability. |
| 21 | sm_user_decision |
DOUBLE | Decision contribution from soil moisture indicator(s). |
| 22 | pfc_thresh |
DOUBLE | Threshold applied for PFC decisioning. |
| 23 | pfc_prob_thresh |
DOUBLE | Probability threshold applied for PFC. |
| 24 | wrsi_thresh_factor |
DOUBLE | Factor used in WRSI thresholding. |
| 25 | wrsi_prob_thresh |
DOUBLE | Probability threshold applied for WRSI. |
| 26 | year |
BIGINT | Partition year (from folder structure). |
Notes: -
sm_*andspw_*fields capture soil-moisture and scenario-specific SPW indicators derived from TAMSAT inputs. - Partition columns (yearvia hive partitioning; month is in the path) should be used for performant queries.
40.4 Example queries¶
40.4.1 Filter by date¶
SELECT date, farm_unique_id, spw_20, spw_40, spw_60,
pfc_user_probability, wrsi_user_probability
FROM read_parquet(
's3://gap-products/staging/tamsat_spw_geoparquet/year=2025/month=9.parquet',
hive_partitioning=true
)
WHERE date = DATE '2025-09-10'
LIMIT 20;
40.4.2 Filter by farm¶
import os, duckdb
conn = duckdb.connect()
conn.install_extension("httpfs"); conn.load_extension("httpfs")
conn.install_extension("spatial"); conn.load_extension("spatial")
farm_unique_id = '4023361'
path = (
f"s3://{os.environ['S3_BUCKET_NAME']}/"
"staging/tamsat_spw_geoparquet/year=2025/month=9.parquet"
)
conn.sql(f"""
CREATE OR REPLACE TABLE spw_tamsat AS
SELECT *
FROM read_parquet('{path}', hive_partitioning=true)
WHERE date BETWEEN '2025-09-05' AND '2025-09-10'
""")
conn.sql(f"""
SELECT date, farm_unique_id,
spw_20, spw_40, spw_60,
pfc_user_probability, wrsi_user_probability,
pfc_user_decision, wrsi_user_decision, sm_user_decision
FROM spw_tamsat
WHERE farm_unique_id = '{farm_unique_id}'
ORDER BY date DESC
""").show()