SOLVING THE PROBLEM OF MISSING DATA After deciding which indicators to include in an index, we…

SOLVING THE PROBLEM OF MISSING DATA After deciding which indicators to include in an index, we collect the data. At this point we become aware of the amount of the missing data, and whether we will be able to encompass the selected group of indicators or not. If a lot of data is missing that cannot be imputed meaningfully, we have to exclude some indicators from the analysis, i.e. to drop out the variables or instances for which the data is missing. So, the case deletion is the first and the most radical way of dealing with this problem. This method is used in the case when, for example, among the observed countries we miss the values for the majority of countries for the larger part of the observed period of time, so that we cannot estimate and impute the missing values. For example, the famous Human Development Index deals with the missing data problem in this way, selecting the variables for which complete data across the domain of countries is relatively easy to obtain, restricting to a small set of indicators for which complete data exists. On one hand this is good since it provides objective and more reliable results, but on the other hand this way we cannot include all important aspects of one phenomenon, but have to restrict to the basic ones. The case deletion approach is, of course, appealing because of its simplicity. However, this is not applicable in cases when missing values cover a lot of instances, or their presence in essential attributes is large (Little & Rubin, 1987) Generally, if this radical exclusion of indicators is not needed or wanted, composite indices can deal with the problem of missing data in three ways (Foa &Tanner, 2012). The first and the simplest solution is to drop out any country for which complete data does not exist; the second solution is to impute missing values using different methods; and the third is to use only existing data in the estimation of the index, but supplementing with an estimated margin of error. The first solution of reducing the sample is not always acceptable, especially if we want to make the global cross-country comparisons. However, there are indices, as the Doing Business Indicators whose authors want to avoid the methodological problems and obtain an objective result (Foa & Turner, 2012). This is why the country domain is smaller than in the case of e.g. ArCo index of technological capability of countries (Archibugi & Coco, 2004), which is measured for 162 countries. So, in order to enlarge the sample, we need to find the right way of imputing the missing values, always having in mind that the selection of methods manipulates the results. We will discuss this in detail in the Section 2.1. The third proposed solution refers to using only the existing data in the estimation of the index, but supplementing the results with an estimated margin of error, based on the number of missing items, among other criteria observed. This approach is used in a number of recent indices such as the Corruptions Perceptions Index (CPI) where the confidence range indicates the reliability of the country scores. It tells that allowing for a margin of error we can be 90 percent sure that the true score for countries lies within the given range. 2.1. Missing data imputation techniques – an overview The literature on the analysis of missing data is extensive and in rapid development. OECD & EC-JRC (2008) published the Handbook on Constructing Composite Indicators – Methodology and User Guide, which provides help in creating indices. The Handbook aims to contribute to a better understanding of the complexity of composite indicators and to an improvement in the techniques currently used to build them. Among others, the Handbook deals with the problem of missing data imputation, and suggests single and multiple imputations as possible solutions. As defined in the Handbook, “imputations are means or draws from a predictive distribution of missing values.” The predictive distribution must be generated by using the observed data Single imputation refers to both implicit and explicit modelling. The implicit techniques are simple. Hot desk imputation refers to filling in the blanks cells with individual data which are drawn from the unit that has similar characteristics (for example if we observe units according to four indicators and miss a value for one unit for indicator x, we will fill in that missing value with the value of indicator x for the unit which is the most similar to the one observed according to the other three indicators). Substitution means the replacement of non-responding units with the unselected units in the sample, while Cold desk imputation is the replacement of missing values with the values from an external source (for example from the previous realization of the same survey or the value of an indicator from the previous year in the case of assessing the countries performance). Additionally, we propose the simple mean imputation which refers to imputing the missing values considering only one instance and its dataset (separate from the sample), imputing the missing value by finding average of the previous and the next value to the one missing (it is important to have in mind that assessing countries performance we deal with time-series datasets) Explicit modeling is more complex and demands more detailed explanations. Using unconditional mean imputation means that we impute the missing values with the sample mean (median, mode) for the observed indicator. The limitation of mean value based imputation and its variations is its focus on a specific variable without taking into account the overall similarities between instances (Ayuyev et al., 2009). This is the easiest way explicit modelling, but not always precise enough. Therefore, we could use other two more sophisticated techniques. Firstly, the regression imputation where missing values are imputed with the predicted values obtained by regression. Here we observe dependent variable and independent variable(s). The dependent variable is the indicator for which we miss some values, and the independent variable(s) is (are) the individual indicator(s) which show strong relationship (usually high correlation) with the dependent variable. Expectation maximization imputation focuses on the interdependence between parameters of the model and the missing values. It is an iterative process. First, the missing values are predicted based on initial estimates of the model parameters values. These predictions are then used to update parameters values, and the process repeated. The sequence of parameters converges to maximum-likelihood estimates, and the time to convergence depends on the proportion of missing data and the flatness of the likelihood function. For more detailed mathematical explanation on explicit modellir es see OECD & EC-JRC (2008, pp. 55-58). Multiple imputation is considered to be one of the most powerful approaches to missing values estimation (Ayeyev et al., 2009). It is a general approach that does not require a specification of parameterised likelihood for all data. The missing data is imputed with a random process that reflects uncertainty Imputation is done N times, to create N “complete” datasets. The parameters of interest are estimated on each data set, together with their standard errors. Average (mean or median) estimates are combined using the N sets and between-and within-imputation variance is calculated. Although any imputation method can be used in multiple imputation (used repeatedly to obtain N values), one of the most general models is the Markov Chain Monte Carlo (MCMC) method. It is a sequence of random variables where the distribution of the observed element depends on the value of the previous one. It is assumed that data are drawn from a multivariate normal distribution and requires the following assumptions: missing at random (MAR) and missing completely at random (MCAR). For more detailed explanation see OECD & EC-JRC (2008, pp. 58- 61). For example, the Environmental Sustainability Index uses the MCMC technique to substitute the missing values (Srebotnjak, 2001). Based on the amount of operations performed, Zhang (2011) presents the following categorisation of imputation techniques: single, multiple, fractional and iterative. Fractional imputation represents a compromise between the single and multiple imputation methods, while iterative imputation techniques primarily use a generate-and-test mechanism, taking into account useful information (including incomplete cases) Fujikawa and Ho (2002) consider a clustering based approach for missing data imputation, where the premise is that units could be grouped such that all the imputations in identified groups are independent from other groups. Distance-based clustering is focused mainly on development of supervised clustering methods and mean/mode based imputations in these clusters (De Mántaras, 1991). They are based on a strict separation for objects within clusters, so it is assumed that there is no influence of instances in one cluster to an imputation process in other clusters. Ayuyev et al. (2009) suggest the improved dynamic clustering-based imputation (DCI) of missing values in mixed type data. They consider the appropriate choice of a method for imputation especially important when the fraction of missing values is large and the data are of mixed type. The proposed DCI algorithm relies on similarity information from shared neighbours, where mixed type variables considered together. Around each instance with a missing value they deterministically construct an independent cluster of similar instances with no missing values for a particular attribute. In contrast to a typical clustering method, they allow cluster intersections meaning that the same unit may be included in many clusters. It relies on a distance measure that considers both categorical and continuous variables and is applicable for estimation of missing values in high dimensional mixed type data. Different authors propose and analyze other complex algorithms for missing data imputation. For example, Abdella & Marwala (2005) introduced a new method for imputing the missing values which uses a combination of genetic algorithms and neural networks for approximation of the missing data. Nelwamondo et al. (2007) compare neural network and expectation maximization techniques, while Lobato

Calculate the price
Make an order in advance and get the best price
Pages (550 words)
$0.00
*Price with a welcome 15% discount applied.
Pro tip: If you want to save more money and pay the lowest price, you need to set a more extended deadline.
We know how difficult it is to be a student these days. That's why our prices are one of the most affordable on the market, and there are no hidden fees.

Instead, we offer bonuses, discounts, and free services to make your experience outstanding.
How it works
Receive a 100% original paper that will pass Turnitin from a top essay writing service
step 1
Upload your instructions
Fill out the order form and provide paper details. You can even attach screenshots or add additional instructions later. If something is not clear or missing, the writer will contact you for clarification.
Pro service tips
How to get the most out of your experience with My Study Writers
One writer throughout the entire course
If you like the writer, you can hire them again. Just copy & paste their ID on the order form ("Preferred Writer's ID" field). This way, your vocabulary will be uniform, and the writer will be aware of your needs.
The same paper from different writers
You can order essay or any other work from two different writers to choose the best one or give another version to a friend. This can be done through the add-on "Same paper from another writer."
Copy of sources used by the writer
Our college essay writers work with ScienceDirect and other databases. They can send you articles or materials used in PDF or through screenshots. Just tick the "Copy of sources" field on the order form.
Testimonials
See why 20k+ students have chosen us as their sole writing assistance provider
Check out the latest reviews and opinions submitted by real customers worldwide and make an informed decision.
Social Work and Human Services
Great work I would love to continue working with this writer thought out the 11 week course.
Customer 452667, May 30th, 2021
Economics
Nice work
Customer 453185, May 21st, 2022
Nursing
Thank you
Customer 453087, March 5th, 2022
Business and administrative studies
awesome work
Customer 453201, June 15th, 2022
Business Studies
Paper is great. Just only needed the one reference. Thank you
Customer 453139, May 4th, 2022
Management
It was gotten well after time I needed to make needed additions, but it is something that did help me. I could not get the concept of ho to start such a project but now reading this, I was over thinking the project it seems.
Customer 452801, July 19th, 2021
Business Studies
Thank you very much for a good job done and a quick turn around time.
Customer 452615, March 31st, 2021
Economics
For some reason I did not get an email that the order was done, but it was on time regardless
Customer 453291, October 16th, 2022
English 101
IThank you
Customer 452631, April 6th, 2021
Accounting
Looks amazing!
Customer 453283, September 2nd, 2022
English 101
great summery in terms of the time given. it lacks a bit of clarity but otherwise perfect.
Customer 452747, June 9th, 2021
Psychology
quick, fast and exact!
Customer 453027, May 24th, 2022
11,595
Customer reviews in total
96%
Current satisfaction rate
3 pages
Average paper length
37%
Customers referred by a friend
OUR GIFT TO YOU
15% OFF your first order
Use a coupon FIRST15 and enjoy expert help with any task at the most affordable price.
Claim my 15% OFF Order in Chat
Live ChatWhatsApp