CAPRI aims at building up a Policy Information System of the EU’s agricultural sector, regionalised at NUTS 2 level or farm types inside NUTS 2 regions with an emphasis on the impact of the CAP. The core of the system consists of a regionalized or farm type agricultural sector model using an activity based non-linear programming approach. One feature of such a highly disaggregated, activity based agricultural sector model is the detailed information resulting from ex ante simulations of policy scenarios concerning the output and input of specific agricultural production activities and their relationships. This information is also a pre condition to judge possible impacts of agricultural production on the environment. However, these systems require as well this kind of information (data) ex-post, at least partially. It is especially necessary to define for each region in the model, at least for the basis year, the matrix of I/O-coefficients for the different production activities together with prices for these outputs and inputs. Moreover, for calibration and validation purposes information concerning land use and livestock numbers is necessary.
Already from the beginning of the development of the CAPRI model, the regional agricultural statistics (EUROSTAT table group reg_agr) was judged as the only harmonized data source available on regionalized agricultural data in the EU. Other regional Eurostat data are suplementing the regional agricultural statistics such that we are currently using the following:
Although the content of the regional datasets has remained in time, the naming and classification within EUROSTAT is undergoing continuous modifications. Tables considered of low interest are discontinued (and may be still used in CAPRI some time after this point, such as table agr_r_landuse). And new topics are covered providing useful data in some areas, for example from agri-environmental indicators (table reg_aei):
The following table shows the availability of the different regional tables as they have been used in the current database (with series completed up to 2014). However, the current coverage concerning time and sub-regions differs dramatically between the tables and within the tables between the Member States. A second problem consists in the relatively high aggregation level especially in the field of crop production. Hence, additional sources, assumptions and econometric procedures must be applied to close data gaps and to break down aggregated data.
Table 6 Availability of regional datain current database after 1983
Table | Official availability |
---|---|
Land use | from 1974 yearly |
Crop production (harvested areas, production and yields) | from 1975 yearly |
Animal production (livestock numbers) | from 1977 yearly |
Agricultural accounts on regional level | from 1980 yearly |
Structure of agricultural holdings and labour force | 2000, 2003, 2005, 2007, 2010, 2013 |
Source: capri\dat\capreg\regio_data_all.gdx
In the last major update of 2015 the original data had been first stored in the TSV format designed by EUROSTAT:
The results of these two steps is a single large tables, which comprise time series of all data retrieved from Eurostat for all tables: land use, crop production, animal populations, cow’s milk collection and agricultural accounts.
The starting point of the methodological approach is the decision to use the consistent and complete national data base (COCO) as a frame or reference point for any regionalization. In other words, any aggregation of the main data items (areas, herd sizes, gross production and intermediate use, unit value prices and EAA-positions) of the regionalized data over regions must match the national values. This is the general rule with some exceptions.
Given that starting position, the following approaches are generally applied:
All the approaches described in the following sub sections are only thought as a first crude estimate. Wherever additional data sources are available, their content should be checked and is often used to overcome the list of these ‘easy to use’ estimates presented in here. Examples are (some) data for Norway, Sweden or Luxembourg that have been collected from national sources. The procedures described in here can be thought as a ‘safety net’ to ensure that regionalized data are technically available but not as an adequate substitute for collecting these data from additional sources.
The agricultural domain of REGIO does not cover regionalized prices. For simplicity, the regional prices are therefore assumed to be identical to sectoral ones1):
\begin{equation} UVAG_r=UVAG_s \end{equation}
Young animal prices are a special case since they are not included in the COCO data base (the current methodology of the EAA does not value intermediate use of animals) but are necessary to calculate income indicators for intermediate activities (e.g. raising calves). Only exported or imported live animals are implicitly accounted for by valuing the connected meat imports and exports.
Young animals are valued based on the ‘meat value’ and assumed relationships between live and carcass weights. Male calves (ICAM, YCAM) are assumed to have a final weight of 55 kg, of which 60 % are valued at veal prices. Female calves (ICAF, YCAF) are assumed to have a final weight of 60 kg, of which 60 % are valued at veal prices. Young heifers (IHEI, YHEI) are assumed to have a final weight of 300 kg, of which 54 % are valued at beef. Young bulls (IBUL, YBUL) are assumed to have a final weight of 335 kg, of which 54 % are valued at beef. Young cows (ICOW, YCOW) are assumed to have a final weight of 575 kg, of which 54 % are valued at beef. For piglets (IPIG, YPIG), price notations were regressed on pig meat prices and are assumed to have a final weight of 20 kg of which 78 % are valued at pig meat prices. Lambs (ILAM, YLAM) are assumed to weight 4 kg and are valued at 80 % of sheep and goat meat prices. Chicken (ICHI, YCHI) are assumed to weight 0.1 kg and are valued at 80 % of poultry prices.
Another special case are sugar beet prices. They are still determined in a program (‘sugar\price_est.gms’) inherited from the 2003 EuroCARE sugar study (Henrichsmeyer et al. 2003). It determines sugar beet prices according to the sugar prices, levies and partial survey results in the 90ies. The estimation results are subsequently used to determine the beet price differentiation also in subsequent years. It is noteworthy that the same program is applied in CAPREG (via quotasprices.gms) and in CAPMOD (via data_prep.gms) to determine base year beet prices.
In cases where data on regional activity levels are missing, a linear trend line is estimated for regional and Member State time series in the definition of the regional database. The gap is then filled with a weighted average between the trend line – using a weight of R² - and a weighted average of the available observations around the gap, using a weight of 1-R². The specific formulation has the following properties. In cases of a strong trend in a time series, the back-casted and forecasted numbers will be dominated by the trend as the weight of R² will be high. With decreasing R², the estimated values will be pulled towards known values.
Apart from gap filling another problem is that in annual cropland statistics at the regional level only cover a few crop activities (cereals with wheat, barley, grain maize, rice; potatoes, sugar beet, oil seeds with rape and sunflower; tobacco, fodder maize; grassland, permanent crops with vineyards and olive plantations). The COCO data base, however, covers some 30 different crop activities. In order to break these aggregates down to COCO definitions, the national shares of the aggregate are used.
As an example, this approach is explained for cereals. Data on the production activities WHEA (wheat = SWHE+DWHE), BARL (barley), MAIZ (grain maize) and PARI (paddy rice) as found in COCO match directly the level of disaggregation in the regional data. Therefore, the mapped regionalized data are directly set equal to the corresponding values in the regional “raw” data. The difference between the sum of these 4 activities and the aggregate data on cereals in the regional raw data must be equal to the sum of the remaining activities in cereals as shown in COCO, namely RYE (rye and meslin), OATS (oats) and OCER (other cereals). As long as no other regional information is available, this difference from the regional raw data is hence broken down applying national shares.
The approach is shown for OATS in the following equations, where the suffix r stands for regional data:
\begin{align} \begin{split} LEVL_{OATS,r} &= (CEREAL_r\\ &\quad -WHEAT_r-BARLEY_r-MAIZEGR_r-RICE_r)\cdot\\ &\quad\frac{LEVL_{OATS,COCO}}{(LEVL_{OATS,COCO}+LEVL_{RYE,COCO}+LEVL_{OCER,COCO})} \end{split} \end{align}
Similar equations are used to break down other aggregates and residual areas in the regional data 2). The Farm Structure Survey (FSS) provides crop areas for a larger number of crops but this survey is usually conducted only every three years. Data from FSS, when available, is also used to aproximate crop areas at regional level.
One important advantage of the approach is the fact that the resulting areas are automatically consistent to the national data if the ingoing information from REGIO was consistent to national level. Fortunately, the regional information on herd sizes covers most of the data needed to give nice proxies for all animal activities in COCO definition. The regional data break down for herd sizes is often more detailed than COCO at least for the important sectors. Regional estimates for the activity levels are therefore the result of an aggregation approach, in opposite to crop production.
In order to generate good starting points for the following steps of data processing and to avoid systematic deviations between regional and national levels in the following consistency steps, all regional level in REGIO are first scaled with the relation between the (national) results in COCO and the regional results when aggregated to the national level (key file is gams\capreg\map_from_regio.gms).
Besides technological plausibility and a good match with existing regional statistics, the regionalized data for the CAPRI model must be also consistent to the national level. The minimum requirement for this consistency includes activity levels and gross production. The “initialisation” of the regional database has been undertaken already to meet this requirement as good as possble but cannot guarantee it. Consistency for activity levels is therefore based on Highest Posterior Density Estimator which ensures (in gams\capreg\cons_levls.gms):
The objective function minimizes in case of animal herds simple squared relative deviations from the herds. In case of crops, a 25% weight for absolute squared difference of the crop shares on UAA plus 75% deviation of relative squared differences is introduced. In the crop sector consistency is also imposed to regional transition matrices for 6 UNFCCC land use categories relevant for carbon accounting (forest land, cropland, grassland, settlements, wetlands, residual land) which are initialised from the national transition matrix estimated in the COCO1 module.
A specific problem is the fact that land use statistics do not report a break down of idling land into obligatory set aside, voluntary set aside and fallow land3). Equally, the share of oilseeds grown as energy crops on set aside needs to be determined. An Highest Posterior density estimator is used (in gams\capreg\cal_seta.gms) to ‘distribute’ the national information on the different types of idling land to regional level, with the following restrictions:
In some cases, areas reported as fallow land are smaller than set-aside obligations. In these cases, parts of grassland areas and ‘other crops’ are allowed to be reduced.
The proceedure for gross output (GROF) is similar to the one for activity levels, as correction factors are applied to line up regional yields with given national production:
\begin{align} \begin{split} CORR_{GROF,o} &= \sum_{j,r}{Levl_{j,r}O_{j,r}}/GROF_{o,n}\\ O_{j,r}^*&=O_{j,r} \cdot CORR_{GROF,o} \end{split} \end{align}
In case of missing statistical information for regional yields, national yields are used. A special rule is used for fodder maize yields, where regional yields are derived from national fodder maize yields, and the relation between regional and national average cereal yields.
For grassland and fodder from arable land, missing yields are derived from national ones using the relation between regional and national stocking densities of ruminants, in combination with assumed share of concentrates in terms of a weighted sum of energy and protein per ruminant activity in CAPRI. Those shares are then scaled with a uniform factor to exhaust on average the available energy and protein from concentrates at the national level. Accordingly, higher fodder yields are expected where ruminant stocking densities are high, acknowledging differences in concentrate shares. If e.g. the stocking densities solely stem from sheep and goat, the assumed impacts on yields is higher. In order to avoid unrealistic low or high yields, those are bounded to a 25%-400% range compared to the regional aggregate.
The input allocation in any given year should not be linked to realised, but to expected yields. Expected yields are constructed using the following modified Hodrick-Prescott filter:
\begin{equation} \text{min} \quad hp=1000 \sum_{1<t<T-1}({y_{t+1}^*-y_{t-1}^*})^2 + \sum_{t}({y_t^*-y_t})^2 \end{equation}
where y covers all output coefficients in the data base. The Hodrick-Prescott filter is applied both at the national and regional level after any gaps in the time series had been closed.
The regional database modules also cover some aspects which are discussed in other parts of this documentation.
The regionalised data base module CAPREG runs in two steps:
To assess the reliability of the CAPRI database in terms of GHG results against official UNFCCC notifications, results from the first step (time series) were insufficient, as the GHG accounting also requires information on the feed allocation. This problem was addressed within the scope of the IDEAg (Improving the quantification of GHG emissions and flows of reactive nitrogen) project4), where an option has been introduced to allow for a consistent accounting of GHG emissions over time. This is able to combine input information from CAPREG time series runs as well as (short run, nowcasting-style) CAPMOD simulation results. Furthermore, an R-based tool was introduced to the CAPRI GUI that maps GHG emissions data from CAPRI to the GHG emission balances contained in the National Inventory Reports (NIRs) that are submitted annually by countries in compliance with UNFCCC GHG reporting obligations.