You can install the latest version of wxsum directly from this GitHub repository using Stata's net install command:
net install wxsum, from("https://raw.githubusercontent.com/jdavidm/wxsum/main/") replaceThe command requires Stata 15 or later. Users working with long time series in Stata/IC may need to split the data into shorter panels due to variable limits.
The wxsum command processes remote sensing rainfall and temperature data and outputs useful statistics. The command can be used with either rainfall or temperature data from any source.
The data must be wide, where each location is a row and each column is a daily reading. Daily weather variable names must be the user-supplied prefix followed by yyyymmdd. For example, if the prefix is rf_, the variable for May 15, 1979 would be rf_19790515.
Z-scores and deviations from long-run averages are computed strictly against the specified number of preceding lr_years. If there is not enough preceding data to satisfy the requested window, deviations and z-scores are skipped for those initial years, though standard variables still generate.
wxsum stubname , type(rain|temp) ini_month(month) fin_month(month) [options]type(rain|temp): Specify the data type. Userainfor precipitation data ortempfor temperature data.ini_month(month): Initial month of the season (e.g., 05 for May).fin_month(month): Final month of the season (e.g., 10 for October).
ini_day(day): Start day of the season. Default is 01.fin_day(day): End day of the season. If not specified, it dynamically defaults to the true last calendar day of the final month (handling leap years correctly).lr_years(#): Number of strictly preceding years used to calculate rolling deviations and Z-scores. Default is 10. Max is 50.rain_threshold(#): Threshold for defining a rainy day. Defaults to 1. Missing rainfall values are excluded from calculations.gdd_lo(#): Lower bound for growing degree days calculation. Required iftype(temp)is used.gdd_hi(#): Upper bound for growing degree days calculation. Required iftype(temp)is used.kdd_base(#): Temperature threshold for calculating Killing Degree Days (KDD). Required iftype(temp)is used.gdd_bin(#): Width of fixed-interval seasonal GDD bins. When specified, creates one integer categorical variablegddcat_YYYYfor each generated GDD season. Requirestype(temp).gdd_binlo(#): Lower endpoint for regular fixed-width GDD intervals. Default is 0 whengdd_bin()is specified. Values below this threshold are assigned to a bottom-coded category. Requiresgdd_bin().gdd_binhi(#): Upper endpoint for regular fixed-width GDD intervals. Values at or above this threshold are assigned to a top-coded category. When omitted, the command automatically extends intervals to cover the empirical maximum. Requiresgdd_bin().tmp_bin(#): Total number of daily temperature bin count variables to create per season. Must be a positive integer from 1 to 42. Requirestype(temp),tmp_binlo(), andtmp_binhi().tmp_binlo(#): Lower bound of the temperature range for bin construction. Required withtmp_bin(). Must be in the same units as the daily temperature data.tmp_binhi(#): Upper bound of the temperature range for bin construction. Required withtmp_bin(). Must be greater thantmp_binlo().shape(wide|long): Shape of the final output. Default iswide. Whenlongis specified, output is stacked with one row per retained unit-year and a variable namedyear. It is strongly recommended to usekeep()with unit identifiers whenshape(long)is requested.keep(varlist): Variables to keep in the final dataset along with the generated wxsum variables.save(filename): File path to save the resulting dataset.
Using data sets as defined above, the wxsum command creates useful statistics in the same fashion for all detected season years. The command handles seasons that span calendar years, such as November to February, and labels the output with the year the season starts.
When type(rain) is specified, the command generates the following variables for each season:
- mean daily in a season
- median daily in a season
- variance of daily in a season
- standard deviation of daily in a season
- skew of daily in a season
- mean of monthly rainfall totals in a season
- deviation from long run average of mean monthly rainfall in a season
- z-score of mean monthly rainfall in a season
- total seasonal
- deviation from long run average of total seasonal
- z-score of total seasonal
- number of observed days with rain in a season
- deviation from long run average of rainy days in a season
- z-score of rainy days in a season
- number of observed days without rain in a season
- deviation from long run average of days without rain in a season
- z-score of days without rain in a season
- percentage of days with rain in a season
- deviation from the long run average of percentage of days with rain in a season
- z-score of percentage of days with rain in a season
- length of leading dry spell at the start of a season
- longest mid-season dry spell
- length of trailing dry spell at the end of a season
When type(temp) is specified, the command generates the following variables for each season:
- mean daily in a season
- median daily in a season
- variance of daily in a season
- standard deviation of daily in a season
- skew of daily in a season
- max daily in a season
- gdd in a season
- deviations from long run average gdd in a season
- z-score of gdd in a season
- GDD category variable
gddcat_YYYY(whengdd_bin()is specified) - kdd in a season
- deviations from long run average kdd in a season
- z-score of kdd in a season
- Daily temperature bin count variables
tmpbinXX_YYYY(whentmp_bin()is specified)
Growing degree days are calculated as capped degree accumulation between gdd_lo(number) and gdd_hi(number): min(max(temp - gdd_lo, 0), gdd_hi - gdd_lo), summed over the season. Killing degree days are calculated above a user specified kdd_base(number). As with the rainfall option, the temperature option also generates deviations in GDD and KDD from the long-term average and the deviation measured as a z-score.
When gdd_bin(number) is specified, the command creates an integer categorical variable gddcat_YYYY for each season that identifies the fixed-width interval containing the seasonal GDD total. Value labels define the GDD intervals (e.g., GDD [50,150), GDD [150,250)). Users can employ Stata's factor-variable notation such as i.gddcat_YYYY to create dummies in estimation commands. This follows the fixed-interval seasonal degree-day approach used in Deschênes and Greenstone-style specifications, while remaining unit agnostic. The bin width should be specified in the same units as the generated GDD variable.
When tmp_bin(number) is specified, the command creates daily temperature bin count variables tmpbin01_YYYY through tmpbinJJ_YYYY for each season. These count the number of nonmissing daily temperature readings falling into fixed temperature intervals, in the spirit of Schlenker-Roberts. When only one daily reading is available, the entire day is assigned to the bin containing that reading. The command is unit agnostic; tmp_binlo() and tmp_binhi() must be in the same units as the daily temperature data. For J >= 3, the lower tail counts days with T < lo, interior bins cover equal-width intervals over [lo, hi), and the upper tail counts days with T >= hi. Missing daily temperatures are not counted.
When shape(long) is specified, the final output is stacked long with one row per retained unit-year. Generated variables have their _YYYY suffixes stripped and a variable year identifies the season. This is a final-output stacking operation; the wide input requirement is unchanged. It can make panel workflows easier and reduce the final number of variables, although it does not yet reduce the peak number of variables created internally.
To try the command out on the sample datasets included in this repository:
Rainfall Example:
use rain.dta, clear
wxsum rf_, type(rain) ini_month(05) fin_month(10) ini_day(15) fin_day(15) rain_threshold(1) keep(hhid) save(rainfall_stats.dta)Temperature Example:
use temp.dta, clear
wxsum tmp_, type(temp) ini_month(11) fin_month(02) gdd_lo(8) gdd_hi(32) kdd_base(32) keep(hhid)GDD Categories Example:
use temp.dta, clear
wxsum tmp_, type(temp) ini_month(04) ini_day(01) fin_month(09) fin_day(15) gdd_lo(18) gdd_hi(30) kdd_base(32) gdd_bin(100) gdd_binlo(50) gdd_binhi(950) keep(hhid)Daily Temperature Bin Counts Example:
use temp.dta, clear
wxsum tmp_, type(temp) ini_month(04) ini_day(01) fin_month(09) fin_day(15) gdd_lo(18) gdd_hi(30) kdd_base(32) tmp_bin(15) tmp_binlo(0) tmp_binhi(39) keep(hhid)Changing the Long-Run Benchmark:
use rain.dta, clear
wxsum rf_, type(rain) ini_month(05) fin_month(10) lr_years(20) keep(hhid)Long Output for Panel Workflows:
use rain.dta, clear
wxsum rf_, type(rain) ini_month(05) fin_month(10) ini_day(15) fin_day(15) rain_threshold(1) keep(hhid) shape(long)If you run into any issues or bugs, please open an issue on the GitHub repository. Be sure to include your exact Stata version, the command you ran, and a sample of your data (or ideally, reproduce the bug using rain.dta or temp.dta).
If you use wxsum in your research, please cite:
Michler, J. D., A. Josephson, O. Barriga-Cabanillas, A. Michuda, and J. C. Oliver. "wxsum: A command for processing temperature and precipitation data." https://github.com/jdavidm/wxsum, version 5.0.
- Jeffrey D. Michler
- Anna Josephson
- Oscar Barriga-Cabanillas
- Aleksandr Michuda
- Jeffrey C. Oliver