A COMPOSITE SCORE FOR A MEASURING INSTRUMENT UTILISING RESCALED LIKERT VALUES AND ITEM WEIGHTS FROM MATRICES OF PAIRWISE RATIOS

1 1 4 ABSTRACT A methodology is proposed to develop a measuring instrument (metric) for evaluating subjects from a population that cannot provide data to facilitate the development of such a metric (e.g. preterm infants in the neonatal intensive care unit). Central to this methodology is the employment of an expert group that decides on the items to be included in the metric, the weights assigned to these items, and an index associated with the Likert scale points for each item. The experts supply pairwise ratios of an importance between items, and the geometric mean method is applied to these to establish the item weights – a well-established procedure in multi-criteria decision analysis. The ratios are found by having a managed discussion before asking the members of the expert panel to mark a visual analogue scale for each item.


INTRODUCTION
In this article, a methodology is proposed to develop a measuring instrument (metric) for evaluating a variety of conditions and situations.It is particularly valuable when the population to be evaluated cannot participate in the construction of the metric, e.g. for item reduction.It is generally useful in its use of weights to put the appropriate emphasis on the items included, and in putting values on the equidistant Likert scale points.The methodology is presented in the context of a case study in which the stress levels in pre-term infants are to be measured.
The problem originated during a research study that aimed to measure pre-term infants' stress levels before and after developmentally supportive positioning, but an appropriate measuring instrument was not available (Hennessy, Maree & Becker 2007:3-11).Contrary to situations such as psychometric testing, in which the subjects (from the population for which a metric needs to be developed) participate, usually by way of a self-administered questionnaire, it is impossible for pre-term infants in the neonatal intensive care unit (NICU) to furnish data that can assist with the development of an instrument capable of measuring stress levels in the pre-term infant.
In this setting, the development of a suitable metric relied on inputs from two expert panels that contributed in three ways: to determine the items that needed to go into such an instrument (fi rst panel), and to allocate weights to the items and to re-scale the Likert scale points for each item (second panel).The need for weights is because items very seldom contribute equally to the composite score of a metric and weights put emphasis on items according to their contribution.
We refer to Likert scale points (rather than values) and reserve the term values for numerical values associated with the Likert scale points.For a particular item one may associate with the fi ve Likert scale points 0, 1, 2, 3, 4 used here the values 0,00; 0,12; 0,35; 0,72; 1,00 indicating that the condition associated with Likert point 2 is 0,35 on a scale from 0 to 1.The values are not equidistant, but attempt to represent the severity of the condition associated with the particular Likert point.In this example, the value associated with Likert point 3 is six times that of Likert point 1 (0,72 versus 0,12).

Input by expert groups
Initially, a group of experts (fi rst panel) was chosen and consulted individually, by written correspondence and telephonic conversation, to determine the items that went into the metric (see Table 1, which illustrates the Hennessy Stress Scale for the pre-term infant (HSSPI)).The items included in the HSSPI were decided on by consensus among the members of the fi rst panel.Subsequently, a second group of experts, hereafter referred to as the panel, was used to provide the information needed to estimate the required item weights and numerical values to be associated with the Likert scale points.
The panel originally consisted of ten members, but three members were excluded due to collaboration and untrustworthiness of their inputs, which showed external infl uences.The remaining expert group, consisting of m=7 members, were provided with the 15 item HSSPI and, for each item individually, every point on the fi ve-point Likert scale (0, 1, 2, 3, 4) was clearly described.To start with, panel members were expected to indicate for each item where the Likert scale points (1, 2, 3) lie on a visual analogue scale (VAS) beginning at 0 and ending at 4. The values for these points were set equal to their distances from 0 and were then re-scaled to fall between 0 and 1.For each of the 15 items, this resulted in the fi ve Likert scale points (0, . . ., 4) being well described in terms of a medical condition, and a numerical value of between 0 and 1 associated with each.The numerical values were supplied by the panel and correspond to the severity of the conditions.
The weights of the items remained to be found.A wellestablished method in multi-criteria decision analysis (MCDA) was employed -the use of pairwise ratios of importance (Belton & Steward 2002:132;Lootsma 1999:53).Each of the panel members contributed to an own n×n judgement matrix A=(a ij ) with the ij-th element a ij denoting the ratio between w i and w j where w i is the importance of item I and a ij therefore represents the importance of item i relative to item j.The value for a ij follows when the panel member denotes with a single marking on a VAS the importance of item i relative to item j.The mark divides a bar of standard length into a left part, denoting the importance of item i, and the (remaining) right part indicating the importance of item j.The length of the left divided by that of the right assigns a numerical value of a ij .

Estimation of distances between Likert scale values
For item i (i = 1, 2, . . ., n) denote the distance on the VAS between the Likert scale value j (j = 1,2,3) and the left-most value 0, as suggested by panel member k, with ( ) i k j d , then for item i the distance to j is estimated as the mean value for the m panel members. i.e.
The Likert scale value 4 is situated at the right-most end of the VAS.Replace the original Likert scale values with weights equal to the latter distances bounded by 0 and 4, re-scale these weights to range from 0 to 1, and denote them by w i,j (i = 1, 2, . . ., n ; j = 0,1,…, 4).

Estimation of item weights
The problem of calculating a preference vector from the ratios entered in a judgement matrix can be presented as a general linear model (Crawford & Williams 1985:393-4).
To fi nd the weights of the n items of the metric, defi ned as the sum of the weighted scores for the n items, each of the m panel members compiled a n×n judgement matrix A=(a ij ).
Assume an underlying vector w t =,...,w n .The a ij are estimates of ratios of the elements of w with random error.Panel member k (via the VAS) supplied estimates ( ) to estimate w i /w j .

.,n
The indices point to the upper diagonal of A (k) only.This is because Taking logs of ( ) The item weights w 1 ,...w n with w i =exp(1nw i ) follow by determining the vector b that minimises the sum of squares using ordinary least-squares regression software and not fi tting the constant.Generally it is convenient to re-scale the weights w i so that they add up to 100.As n increases, the n×n matrix A k is enlarged and it becomes unfeasible for each panel member to assess every position in the matrix. Since it is only necessary to consider the entries for which i < j in order to estimate the w i .However, if this number is still too large, the estimation of the w i can be based on a sample of the a ij , selected in such a way that all i and j are connected.For the current problem, 61 of the possible a ij were considered.
A further generalisation of ( 2) is to allow the m panel members to fi ll in different cells in their A (k) .The same comparisons by different panel members are regarded as replicates.With or without this generalisation, the assumption is that the panel members are able to supply replicates.In general, even the handpicked members of an expert panel are not that similar, and the effect of panel members should be accounted for in the analysis.In this study, the latter was done by adding 'panel member' as a fi xed effect to our model.
The metric or composite score, out of 100, for the HSSPI is where I j = 1 if item i takes the point j on the Likert scale and = 0 otherwise.The items are taken from Table 1, the values w ij from Table 2 and the weights w i from Table 3.

RESULTS
The values w ij assigned to the Likert scale points by the expert panel are given in Table 2, and the item weights w i determined using Stata Statistical Software Release 8.0 (Statacorp 2003), both when assuming panel members are replicates (Crawford & Williams 1985:395) and when including panel members as a fi xed effect in the regression model, are given in Table 3.
An example of how (3) is computed: Suppose the Likert values for items 1 to 15 for a given infant were 1; 0; 3; 2; 4; 1; 1; 3; 3; 2; 0; 1; 3; 2; 1, then the composite score when panel member is a fi xed effect is The HSSPI, now weighted, was then used in a pre-test/posttest design to observe infants prior to a specifi c positioning intervention, and again after the intervention (Hennessy et . 2007:3-11).The composite score expressed stress level as a percentage.The pre-test stress scale was performed as the pre-term infant was waking up for the three-hourly routine before care commenced.Routine care was then done and, once routine care was completed, the pre-term infant was positioned according to specific principles with the use of the positioning aids.The infant was left for three hours without unnecessary disturbance.Before the following routine care commenced, the post-test stress scale was performed to determine whether the intervention had been successful.This would be confirmed if the stress levels measured lower on the post-test than the pretest stress scale.The results of the study were published in a previous article by Hennessy et al. (2007:3-11).

DISCUSSION
A wide variety of conditions and phenomena manifest themselves not as a single measurable attribute, but as a phenomenon that is intuitively understood but not easily measured, mainly because there is no single attribute.In these many cases, researchers resort to constructing a composite score that takes into account a number of aspects (called items) of the phenomenon considered.The process starts with the nomination of (supposedly) all the relevant items and proceeds to cull them in what is called item reduction.Item reduction removes the items that are not contributing to the measurement because they are not relevant, or because they coincide with one or more of the other items and are made redundant by the presence of these items.The item reduction is frequently done by the statistical analysis of large samples of questionnaires.We propose the use of a panel of experts both for the nomination and the reduction of the items.
A panel of experts would then consist of a group of expert individuals with special knowledge or skill in a subject (The Concise Oxford Dictionary 1990:411), working together to produce a desired result As discussed by De Vos (1998:180), literature that exists in any discipline usually represents only a section of the knowledge of people involved in a specialised field on a daily basis.An expert panel can contribute the knowledge relevant to the metric to be constructed.They bring explicit knowledge and a wealth of experience that cannot be gleaned from any number of questionnaires filled in by the subjects of the study.In particular, they contribute to the clarity and relevance of the selected items (Gauthier & Froman 2001:301).
Once the items have been identified, the metric is constructed as a composite score, where the items (frequently) have a number of possible outcomes.Most metrics do not weigh the items, meaning that every item is as important as any other.We agree with Lynn (1986:382), who argues that different items contribute at different levels and have different content validity ratings or weights, and propose that an expert panel be asked for inputs into a process for the calculation of the weights.The process is based on a well-known and frequently used technique from multi-criteria decision analysis.We propose that the eliciting of answers be based on a VAS because of its direct appeal to the mental model being queried, thus avoiding various welldocumented problems with semantic scales (Belton & Steward 2002:132;Lootsma 1999:53).
Many metrics allocate equidistant values to the different outcomes of an item, e.g.most metrics that use the Likert scale.Considering Table 2 above, it is clear that this is not necessarily correct and can even be seen as mostly wrong.It is possible to construct cases where this practice gives excessively wrong answers, but it is always preferable to eliminate even the small measuring errors.For this purpose we propose that the Likert scale be seen as consisting of an index (the Likert scale point) that takes on values 0,...,m, a description (in this case study a clinical description) associated with each Likert scale point, and a value associated with the description and associated Likert scale point.We also propose that the values be found by using a VAS.Finally, the metric is the weighted average of the Likert values of the items.
For the case study, the compilation of the panel was based on an identification of the disciplines relevant to the problem.This increased the reliability and validity of the study, as a wider perspective was achieved.Triangulation of all the main sources for the accumulation of the stress scale content was done to enhance the reliability and validity of the stress scale, including clinical observation, expert opinion, theory and empirical research.The stress scale was based on conceptual definitions and concepts of the research to ensure that one base of knowledge was used for research and instrument development.The stress scale was also pilot-tested to allow for revision and alteration before data collection commenced.
A possible shortcoming involving incongruence between study conceptualisations and scale content was reduced by providing the expert panel with the research proposal and stress scale for feedback prior to the meeting of the expert panel.

CONCLUSION
A methodology was discussed according to which the employment of an expert group is central in deciding on the items to be included in the metric, the weights assigned to these items and, for each item, an index associated to the Likert scale points.The experts supply pairwise ratios of an importance between items, and the geometric mean method is applied to these to establish the item weights -a well-established procedure in multi-criteria decision analysis.The ratios are found by having a managed discussion before asking the members of the expert panel to mark a visual analogue scale for each item.
This methodology is proposed to develop a measuring instrument (metric) for evaluating subjects in a population that cannot provide data to facilitate the development of such a metric (e.g.pre-term infants in the neonatal intensive care unit).
1) In matrix notation, (1) can now be expressed in the general linear model form y=xb+e with ( ) ln k ij a the general element of the observation vector y, 1nw the general element of the coeffi cient vector b, and ( ) k ij e the general element of the vector e, where , and the different elements of the design matrix X, with a column for each item, can take the values -1, 0 or 1, e.g.From (1) the fi rst line of X would give m panel members supplied replicates and this was accommodated by the number of equations in y=xb+e, which is equal to the total number of entries in all the matrices A(k) .

Table 1
Hennessy Stress Scale for the Preterm Infant -Items

Table 2
Expert panel weights for the Likert scale points

Table 3
Item weights following analysis of panel members' Subjective Judgement Matrices