A cluster analysis of variables essential for climate change adaptation of smallholder dairy farmers of Nandi County, Kenya

Smallholder dairy farmers occupy high potential areas of Kenya and are a source of manure, crops and milk. There is need to use other means of characterising smallholder dairy farmers as they mostly practice mixed farming. The objective of this paper is to use cluster analysis method to characterize the smallholder dairy farmers with added farmer and activity data variables. Clusters of 336 farmers in this study were derived using 28 key variables. This paper demonstrates how to conduct farmer assessments for climate change adaptation activities, climate smart technologies implementation using knowledge of key farmer variables and their distribution in the smallholder dairy farmers of Nandi County, Kenya. This paper demonstrates the importance of integrating agricultural information for smallholder dairy farmers to machine models to characterize the groups and observe the natural groupings. This allows for policy managers to know the key characteristics and how to use them in policy implementation especially in designing climate change adaptation programs factoring education and training of farmers as demonstrated in this paper that they are practicing many activities on their farms.


INTRODUCTION
Sub-Saharan Africa (SSA) has the fastest growth in agriculture and the greatest level of agricultural imports compared to other global regions (Livingston et al., 2011). This growth follows huge demand for food and thus the importance of smallholder agriculture to the food security is cemented for this SSA (Bellarby et al., 2014). Smallholder farmers play a key role in development in Africa especially in recent years (Hammond et al., 2015;Livingston et al., 2011;Salami et al., 2010). Smallholder farmers are always characterised in many studies based on land size (Bebe et al., 2002;Herrero et al., 2014;van Averbeke and Mohamed, 2006).
Climate change adverse impacts are heavily felt by the smallholder farmers such as changes on climate variables of rainfall and temperature (Altieri and Koohafkan, 2008;Koohafkan et al., 2012). There exists great diversity in smallholder farmers despite the key variable used for characterizing smallholder farmers, that is farm size (Bebe et al., 2002;Brandt et al., 2018). This diversity is observed as smallholders farmers do mix livestock and crop production and such diversity is seen in smallholder dairy farmers in SSA (Bebe et al., 2002;Oborn et al., 2017;Staal et al., 2002). One of the main reasons climate change affects smallholder farmers is the extra mining of nutrients on their farms, which leads to lower production and also higher susceptibility to climate change effects (Bationo et al., 2004;Castellanos-Navarrete et al., 2015;Rufino et al., 2007).
Smallholder dairy farmers' adaptation to climate change would need the characterization of these farmers with more variables other than farm size as the only limiting factor (Auburger et al., 2015;Herrero et al., 2014;van Averbeke and Mohamed, 2006;Vrieling et al., 2011). There is need for the characterisation of smallholder dairy farmers with additional variables such as land use sizes of the farms, labour, type of livestock housing and other key variables that define smallholders' enterprise (Nyambo et al., 2019;Staal et al., 2002;Waithaka et al., 2007;Zake et al., 2010). These characterisation should use robust unsupervised models and integrated systems research to show natural groupings of smallholder dairy farmers to allow the policy makers to promote climate change adaptation practices (Nyambo et al., 2019;Oborn et al., 2017;Thompson, 2016). The lack of a characterisation of smallholder farmer with robust inclusion of key variables based on land survey limits options for policy makers as farm size reduces over time (Bebe et al., 2003). This paper seeks to use cluster analysis to characterize smallholder dairy farmers of Nandi County Kenya.

Study area
The field study was conducted within Nandi County, Kenya (0. 565°N, 34.736°E, 0.565°N, 35.437°E, 35.437°E, 0.118°S, 34.736°E, 0.118°S). Mean annual temperatures ranges from 18-22°C, with temperatures at lower elevations (< 1400 m) going as high as 26°C. Altitude ranges from approximately 600 m a.s.l. in the South to over 2200 m a.s.l. in the North east of the county. The highlands are recognized for their high agricultural potential (GOK, 2015;Mudavadi et al., 2001). Nevertheless, livestock and crop farming is mainly subsistence with average land sizes of approximately 4.5 ha per household. Dairy production is common throughout the county, with tea as a major cash crop, and maize as the primary staple crop (GOK, 2015).

Site classification
Agro-ecological zones (AEZ) were identified on the basis of altitude, rainfall, temperature and predominant land use (GOK, 2015). This resulted in three major AEZs: lower highland 1 (LH1: 1900-2400 m a.s.l., area of 934.3 km2, high seasonal variation in rainfall and thus having distinct long and short rains, main crops tea and maize). Lower highland 2 (LH2: 1400-1900 m a.s.l., area 1100.7 km2, low seasonal variation in rainfall characterized by bimodal rainfall -November-January as the short rains and May-July as the long rains, main crops tea and maize); upper midlands (UM: 1200-1400 m a.s.l. and an area of 364.7 km 2 with high seasonal variation in rainfall, main crops Sugarcane and maize). A participatory mapping exercise was conducted using experts' knowledge of personnel from the International Livestock Research Institute (ILRI) and Nandi county government to predict whether there were differences in dairy production systems across the AEZs. Thirty-six sampling points were generated with QGIS based on nearness to road infrastructure and masked away from forested areas with the assumption of no households on roads or in forests. The sampling points were kept away from forests since forests are legally gazetted by section 64 (1) of the Forest Conservation and Management Act, number 34 of 2016, prohibiting grazing (Republic of Kenya, 2016). The number of sampling points assigned to each of the three AEZ was weighted by the area of each individual AEZ (cluster), resulting in 15 sites each being located in LH1 and LH2 and six sites being located in the UM (Figure 1). At each of the 36 randomly selected points nine farmers were interviewed to generate a sample size of 336 households.

Household surveys and questionnaire
The household surveys were done using a questionnaire tool customized from the Integrated Modelling Platform for mixed Animal Crop systems (IMPACTlite) (Rufino et al., 2013). IMPACTlite was modified from IMPACT (Herrero et al., 2007) to collect householdlevel data, which was detailed enough to capture within-site variability on key farm performance and livelihood indicators. It was initially developed to encourage data sharing through standard protocols, and allowing tools to be linked to facilitate evaluations of various farming systems (Rufino et al., 2013). The household questionnaire was completed through face-to-face interviews with the household head using the Open Data Kit (ODK) platform (ODK, 2017). In case of absence of the household head, the most senior member available or the household member responsible for the farm was interviewed. A determination of the primary income categories (crop, dairy, poultry and others) as well as farm size and other farmer demographic data on literacy, age and gender were collected. From this, four animal confinement systems were defined: ‗fence only (F)', ‗fence and floor (FF)', ‗fence and roof (FR)' and ‗fence, roof and floor (FRF)'. The animal confinements formed the base for manure management systems so as to be able to relate the confinement with the manure management systems in use.
Manure management systems were classified according to Table  Figure 1. Results of silhouette plot analysis for the Nandi County dataset showing the optimal four clusters.
10.18 provided in the guidelines for GHG emission estimates (IPCC, 2006) and were characterized based on the state of manure being deposited, the location where the manure is stored as well as the duration of storage. The state of manure was either defined as ‗fresh'-period less than 24 h from excretion, ‗dry' -period more than 24 h from excretion, or ‗bioslurry'that is from biodigesters or farms with pit or lagoon, where liquid manure is collected. Manure handling and storage was characterized as: ‗heap for composting', ‗pit for fresh or dry manure', ‗heap of either fresh or dry manure' and ‗pit/lagoon for slurry'. The duration of manure storage before utilization on farm is a proxy for manure quality. Therefore, classification of manure storage was done according to three periods: less than 1 month equalling good quality, 3-4 months meaning reduced quality, and greater than four months equalling least manure quality.

Cluster analysis
Unsupervised learning algorithm was used for cluster analysis. This algorithms was K-means (Chibanda et al., 2009;Nyambo et al., 2019). In the analysis, the number of groups (K) represented how many farm typologies (clusters) could be defined for each dataset.
The number of clusters that best represented the data was determined using the Elbow method (where a bend or elbow in a graph showing decline of within cluster sum of squares differences as the number of clusters increases provides the best solution) (Nyambo et al., 2019) The elbow method examines the percentage of variance explained by the clustering as a function of the number of clusters k (Kingrani et al., 2017;Syakur et al., 2018). The Kmeans algorithm has been widely used in non-hierarchical clustering and characterizing smallholder dairy farms (Kingrani et al., 2017;Nyambo et al., 2019;Tittonell et al., 2010). The algorithm uses Euclidean distance measures to estimate weights of data records. The algorithm is presented as Equation 1, with a segment of the Euclidean distance as in Equation 1. (1) where ‖ ‖ computes the Euclidean distance as in Equation 1; k = number of clusters, n=number of observations, j=minimum number of clusters, i= minimum number of observations, x i =Euclidean vector for any ith observation, and cj =cluster centre for any jth cluster. Production cluster outputted from the clustering algorithm was validated in three ways: (1) assessment of cluster robustness, (2) comparison of the cluster membership reallocation (differential allocation of households to clusters for training and testing datasets), and (3) evaluation of the proportion of variation explained by the clusters.

Feature selection
The top 28 features synthesised from literature on smallholder dairy farmers were tabulated (Table 1). These variables have been known to influence productivity in smallholder dairy farming based on experts' domain knowledge. These features and their amounts were Boolean, Discrete and continuous and derived from household survey of 336 smallholder dairy farmers in Nandi County. These variables would be used to identify ‗natural groupings' of these 28 features to derive the number and type of clusters (Chibanda et al., 2009;Nyambo et al., 2019;Syakur et al., 2018). This was done by minimising the squared Euclidean distance within a decreasing number of clusters containing an increasing number of positively related variables and using Base R Package (RStudio V 1.1.442) within which dendrogram and plot showing optimal number of clusters using k means was generated (Chibanda et al., 2009). Each of the variables used in clustering was described as percentages (gender, education level, income  (Nyambo et al., 2019). Rank analysis using the spearman correlation coefficient was used to evaluate the level of features reallocation between clusters. Hierarchical clustering was applied and it works in a bottom-up manner. That is, each variable object is initially considered as a single-element cluster (leaf). At each step of the algorithm, the two clusters that are the most similar are combined into a new bigger cluster (nodes). This procedure is iterated until all points are member of just one single big cluster (root). Quality control was achieved by using cross clustering in a partial clustering algorithm that combines the Complete Linkage algorithms and Ward's minimum variance providing automatic estimation of a suitable number of clusters and identification of outlier elements (Tellaroli et al., 2016).

Clustering
Based on the Elbow method, a four-cluster solution was found to be optimal for Nandi County dataset and was fitted in the clustering model (Figures 1 and 2). This was confirmed even by quality control using cross clustering and outliers. Cluster analysis was used to classify the smallholder farmers and examine how key variables of acreage for grazing, total acreage, education level, number of dairy cattle in the households and manure management affect their labour practices and major income categories and also classify them (Table 2). This agrees with observations from other studies on smallholders where such variables were enumerated (Nyambo et al., 2019). The study found that there are four classes split by gender and major income categories. These clusters when using discrete variables showed focus areas such as the low education level of farmers and they were majorly male dominated. These clusters had total low acreage, as well as areas available for grazing, the farmers had less labour and high dairy livestock numbers. This finding agrees with Chibanda et al. (2009), Tittonell et al. (2010 and van Averbeke and Mohamed (2006) whose studies found that cluster analysis gave the key variables of note to define the components for practice change. There has been studies showing how farmer education can improve various farm practices and subsequently make climate change adaptation (Ausden, 2014;Boswell et al., 2010;Waithaka et al., 2007;Zake et al., 2010). The basic aim of cluster analysis is to find the -natural groupings‖, if any, of a set of individuals (cases or variables). This was an objective of this study and also agreed with other studies where the advantages of using cluster analysis were extolled (Adeyemo et al., 2019;Chibanda et al., 2009;Kwale, 2013). This study found that after running the analysis, the variables main income category, labour numbers, dairy livestock populations and grazing area acreage were key driving forces for the smallholder farmers. This leads to four clusters based on gender and education levels with differences occurring on the quantities for the other variables. Thus, this study's four clusters were the natural groupings of the smallholder dairy farmers.

CONCLUSION AND RECOMMENDATION
There are many ways to use cluster analysis. The kind of cluster analysis utilised in this study is how to form similar sets of variables. The purpose of this analysis in this study is, therefore, to draw inferences about natural groupings of smallholder dairy farmers and the nature of the key variables used in these groupings. Hierarchical clustering was appropriate for and could also be applied to qualitative variables (Kwale, 2013;Nyambo et al., 2019). The major result of this study was that with as many as 28 variables the cluster analysis revealed only four distinct natural groupings. In this study, cluster analysis is used to test the proposition that there are simple natural groupings of smallholder dairy farmers and they are mirrors with realisation that the analysis yielded distinct groupings (Chibanda et al., 2009;Condliffe et al., 2008). The study recommends there should be future research to give a detailed characterisation of other areas and types of farmers using cluster analysis to compare the counties and also scenarios if scaled to the region.