Differences

This shows you the differences between two versions of the page.

--- the_regionalised_data_base_capreg [2020/02/07 11:30] – [Methodology applied in the regional data consolidation] matsz
+++ the_regionalised_data_base_capreg [2020/02/11 08:47] – matsz
@@ Line 62: / Line 62: @@
 === Activity Levels===
+In cases where data on regional activity levels are missing, a linear trend line is estimated for regional and Member State time series in the definition of the regional database. The gap is then filled with a weighted average between the trend line – using a weight of R² - and a weighted average of the available observations around the gap, using a weight of 1-R². The specific formulation has the following properties. In cases of a strong trend in a time series, the back-casted and forecasted numbers will be dominated by the trend as the weight of R² will be high. With decreasing R², the estimated values will be pulled towards known values.
+Apart from gap filling another problem is that in annual cropland statistics at the regional level only cover a few crop activities (cereals with wheat, barley, grain maize, rice; potatoes, sugar beet, oil seeds with rape and sunflower; tobacco, fodder maize; grassland, permanent crops with vineyards and olive plantations). The COCO data base, however, covers some 30 different crop activities. In order to break these aggregates down to COCO definitions, the national shares of the aggregate are used.
+As an example, this approach is explained for cereals. Data on the production activities WHEA (wheat = SWHE+DWHE), BARL (barley), MAIZ (grain maize) and PARI (paddy rice) as found in COCO match directly the level of disaggregation in the regional data. Therefore, the mapped regionalized data are directly set equal to the corresponding values in the regional “raw” data. The difference between the sum of these 4 activities and the aggregate data on cereals in the regional raw data must be equal to the sum of the remaining activities in cereals as shown in COCO, namely RYE (rye and meslin), OATS (oats) and OCER (other cereals). As long as no other regional information is available, this difference from the regional raw data is hence broken down applying national shares.
+The approach is shown for OATS in the following equations, where the suffix r stands for regional data:
+\begin{align}
+\begin{split}
+LEVL_{OATS,r} &= (CEREAL_r\\
+&\quad -WHEAT_r-BARLEY_r-MAIZEGR_r-RICE_r)\cdot\\
+&\quad\frac{LEVL_{OATS,COCO}}{(LEVL_{OATS,COCO}+LEVL_{RYE,COCO}+LEVL_{OCER,COCO})}
+\end{split}
+\end{align}
+Similar equations are used to break down other aggregates and residual areas in the regional data ((If no data at all are found, the share on the utilisable agricultural area is used.)). The Farm Structure Survey (FSS) provides crop areas for a larger number of crops but this survey is usually conducted only every three years. Data from FSS, when available, is also used to aproximate crop areas at regional level.
+One important advantage of the approach is the fact that the resulting areas are automatically consistent to the national data if the ingoing information from REGIO was consistent to national level. Fortunately, the regional information on herd sizes covers most of the data needed to give nice proxies for all animal activities in COCO definition. The regional data break down for herd sizes is often more detailed than COCO  at least for the important sectors. Regional estimates for the activity levels are therefore the result of an aggregation approach, in opposite to crop production.
+In order to generate good starting points for the following steps of data processing and to avoid systematic deviations between regional and national levels in the following consistency steps, all regional level in REGIO are first scaled with the relation between the (national) results in COCO and the regional results when aggregated to the national level (key file is gams\capreg\map_from_regio.gms).
+Besides technological plausibility and a good match with existing regional statistics, the regionalized data for the CAPRI model must be also consistent to the national level. The minimum requirement for this consistency includes activity levels and gross production. The “initialisation” of the regional database has been undertaken already to meet this requirement as good as possble but cannot guarantee it. Consistency for activity levels is therefore based on Highest Posterior Density Estimator which ensures (in gams\capreg\cons_levls.gms):
+  - Adding up of activity levels from lower regional level (NUTS II, NUTS I) to higher ones (NUTS I, NUTS 0)
+  - Adding up of crop areas to UAA at regional level.
+The objective function minimizes in case of animal herds simple squared relative deviations from the herds. In case of crops, a 25% weight for absolute squared difference of the crop shares on UAA plus 75% deviation of relative squared differences is introduced. In the crop sector consistency is also imposed to regional transition matrices for 6 UNFCCC land use categories relevant for carbon accounting (forest land, cropland, grassland, settlements, wetlands, residual land) which are initialised from the national transition matrix estimated in the COCO1 module.
+A specific problem is the fact that land use statistics do not report a break down of idling land into obligatory set aside, voluntary set aside and fallow land((The necessary additional information on non-food production on set-aside, obligatory and voluntary set-aside areas can be found on the DG-AGRI web server.)). Equally, the share of oilseeds grown as energy crops on set aside needs to be determined. An Highest Posterior density estimator is used (in gams\capreg\cal_seta.gms) to ‘distribute’ the national information on the different types of idling land to regional level, with the following restrictions:
+  * Obligatory set-aside areas must be equal to the set-aside obligations derived from areas and set-aside rates for Grandes Cultures (which may differ at regional level according to the share of small producers). For these crops, activity levels are partially endogenous in the estimation in order to allow a split up of oilseeds into those grown under the set-aside obligations and those grown as non-fo-od crops on set-aside.
+  * Obligatory and voluntary set-aside cannot exceed certain shares of crops subjects to set-aside (at least before Agenda 2000 policy)
+  * Fallow land must equalise the sum of obligatory set-aside, voluntary set-aside and other idling land.
+  * Total utilisable area must stay constant.
+In some cases, areas reported as fallow land are smaller than set-aside obligations. In these cases, parts of grassland areas and ‘other crops’ are allowed to be reduced.
+===Production and yields ===
+The proceedure for gross output (GROF) is similar to the one for activity levels, as correction factors are applied to line up regional yields with given national production:
+\begin{align}
+\begin{split}
+CORR_{GROF,o} &= \sum_{j,r}{Levl_{j,r}O_{j,r}}/GROF_{o,n}\\
+O_{j,r}^*&=O_{j,r}*CORR_{GROF,o}
+\end{split}
+\end{align}