You are here: Home > Higher education > Publications & resources > other_publications > Graduate Skills Assessment

Graduate Skills Assessment

Stage One Validity Study

(Published on Internet Only)

Sam Hambur
Ken Rowe
Le Tu Luc
(Australian Council for Educational Research)

03/02
Evaluations and Investigations Programme

© Commonwealth of Australia 2002

ISBN 0 642 77295 9 (Internet copy)
DETYA No. 6911.HERC02A

This work is copyright. It may be reproduced in whole or in part for study or training purposes subject to the inclusion of an acknowledgment of the source and no commercial usage or sale. Reproduction for purposes other than those indicated above, requires the prior written permission from the Commonwealth available from the Department of Communications, Information Technology and the Arts. Requests and inquiries concerning reproduction and rights should be addressed to Commonwealth Copyright Administration, GPO Box 2154, Canberra ACT 2601 or e-mail commonwealth.copyright@dcita.gov.au.

This report is funded under the Evaluations and Investigations Programme of the Department of Education, Science and Training.

The views expressed in this report do not necessarily reflect the views of the Department of Education, Science and Training.

Executive summary

The Graduate Skills Assessment (GSA) test has been designed to assess a set of valued and widely applicable generic skills that may be developed through the university experience, and which are relevant to university achievement and graduate work.

This GSA validity study was commissioned by the Commonwealth Department of Education, Training and Youth Affairs (now known as the Department of Education, Science and Training).

The study addresses the validity of the first two Graduate Skills Assessment (GSA) tests, GSA Exit 2000 and GSA Entry 2001 (Stage One). A total of 3663 students drawn from nine broad fields of study across 27 Australian universities were involved in one or other of these tests.

This summary provides:

  • a description of the aims and scope of the study;

  • key background information;

  • a consideration of the study sample;

  • key findings and conclusions;

  • some recommendations for the future; and

  • concluding remarks that reflect on the ongoing challenges of the GSA.

top

Aims and scope (Chapter 1)

The validity study has the following major aims:

  1. To investigate the dimensional factor structure (discriminant validity) of the test;

  2. To identify variables related to differential performance on the GSA;

  3. To investigate the relationship between student performance on the GSA and other measures of student achievement;

  4. To consider the suitability of current reference ranges; and

  5. To evaluate the face/content validity of the GSA construct and items.

top

Background (Chapters 1, 2 and 3)

  1. The GSA grew out of an increasing interest in generic skills related to the need for an adaptable workforce in modern economies. Both employers and universities have an interest in generic skills, though they do not necessarily value the same skills equally.

  2. The GSA is based on an assumption that certain generic skills, though taught within a particular context, can be transferred to another context once there is sufficient familiarity with that context. It is expected that those with the highest levels of generic skills make such transfers most readily.

  3. The skill domains chosen for assessment by the initial GSA are: Written Communication, Critical Thinking, Problem Solving and Interpersonal Understandings.

  4. During test development, the focus of each domain was narrowed in a way that was expected to produce a test component that assesses a psychometrically coherent generic skill dimension.

  5. The focus of the GSA is on cognitive skills since these are more amenable to assessment. The test does not assess directly those personality traits that may be related to putting into action the relevant skills/understandings. It is hoped that longitudinal studies will indicate an association between the skills/understandings and outcomes.

  6. Each component of the GSA aims to present tasks that are generally meaningful, accessible and contextually appropriate, so that specialised knowledge is not required. Whereas Year 12 literacy and Year 9 numeracy is assumed, higher-level meta-strategic and meta-cognitive skills need to be applied.

  7. The Critical Thinking (CT) component of GSA aims to assess some markers of the ability to think critically about viewpoints and arguments. Students are expected to use comprehension, analysis and synthesis to assimilate and evaluate viewpoints and arguments. Partly to distinguish CT psychometrically from Problem Solving, material is presented in text format.

  8. The Problem Solving (PS) component of GSA aims to assess some markers of the ability to analyse and transform information as a basis for making decisions and progressing toward the solution of practical problems. Students are expected to show insight into the problem to identify and deal logically with key information. Analytical, logical and quantitative reasoning need to be applied. Partly to distinguish PS psychometrically from CT, the information is presented in low verbal and non-verbal formats.

  9. The Interpersonal Understandings (IP) component aims to assess the ability of students to show insight into the feelings, motivation and behaviour of others, and into approaches related to helping or working with others, such as effective feedback and teamwork. The information is mostly presented as text but some pictorial material is used.

  10. The Written Communication component aims to assess the ability of students to write effectively in two genres: Argument (ARG) and Report (REP). The Argument task requires students to develop a point of view about an issue and structure a clear, coherent and logical argument in support of that view. The Report task requires students to comprehend, select, organise and present clearly a summary report based on facts, figures and pictures presented in the stimulus.

top

Study sample (Chapter 4)

  1. A total of 3663 students drawn from nine broad fields of study across 27 Australian universities were involved in the first two GSA tests.

  2. Since the sample of students sitting GSA was largely self-selected, it is unlikely to be representative of the general university population. This is confirmed by observations of significant differences between the GSA and general university populations in terms of variables such as field of study composition and the proportion of students with English-speaking background.

  3. Because it is unclear how the deviations of the GSA population from the composition of the general university population will affect the results of this study, particular caution needs to be taken in drawing conclusions.

  4. It is expected that statistical methods based on linear relationships (such as correlation and linear regression) would not be greatly affected by the non-representativeness of the sample. Therefore, it is expected that general findings related to the factor structure of the test (Chapter 5), variables related to performance on the test (Chapter 6) and the relationship between performance on the GSA and other measures of achievement (Chapter 7) are likely to have significant validity.

top

Findings and conclusions

Factor structure and discriminant validity (Chapter 5)
  1. In support of test validity, confirmatory factor analysis indicates that the test does measure five coherent and distinguishable (discriminant) dimensions in line with the test construct. Thus, there is no reason to collapse or combine dimension scales, unless students within a narrow field of study are considered (who tend to perform similarly on components other than Problem Solving).

  2. A second-order factor is also observed on which each of the five dimension factors loads significantly. This second-order general factor may be related to a form of meta-cognitive general executive reasoning skill that can be applied to a range of tasks.

  3. For appropriate measurements and comparisons to be made between years, it is essential that the factor structure of the test is monitored and maintained.

top

Variables related to student performance on GSA (Chapter 6)
  1. 1 Findings with respect to variables related to student performance on the test include the following:

  • There are distinctive profiles of student performance on the GSA components related to field of study that seem meaningful on the basis of known strengths of field of study groups (e.g., humanities students do relatively well on Writing and Critical Thinking).

  • When first-degree students are considered within fields of study, there is a statistically significant difference in GSA scores for all five components between first and third year students. This observation supports test validity but needs to be clarified by studies in which the same students are tested in first and third year.

  • Multivariate, multilevel analysis indicates that field of study, year level and familiarity with English (i.e., English-speaking background – ESB) appear to be related to performance on all five GSA dimensions. Gender seems to be related to performance on Problem Solving (with males doing better) and Interpersonal (with females doing better). Age seems to be related to performance on Problem Solving (with younger students doing better) and Interpersonal (with mature age students doing better). Other variables may be relevant but need further investigation.

  • The multivariate, multilevel models used (which consider field of study, English-speaking background, age, gender, school type and course year) explain about 30% of the variance in students’ GSA scores, with field of study being the largest single contributor. However, the majority of the variance seems to be explained by other variables, including student-specific variables such as ability in relation to the skills assessed by the GSA and motivation.

  1. Whether variables such as English-speaking background, age and gender are related to test performance inappropriately is not clear. Studies need to be done to monitor whether performance on the test with respect to these variables matches performance at university and in graduate work.

  2. If student GSA achievement improves in a short period of time from first to third year, the GSA is likely to be assessing developing generic skills and not just a traditional fluid intelligence.

  3. Obtained samples were inadequate to provide suitable ‘value-added’ estimates for either universities or fields of study within universities.

top

Relationship between performance on GSA and other measures of student achievement (Chapter 7)
  1. Performance on the GSA should correlate with performance on similar tasks including those related to success at university and graduate work. An investigation was undertaken to examine relationships between GSA, tertiary entrance (TER) and grade point average (GPA) scores. Because universities have different types of entrance scores and ways of predicting academic success in courses, such analyses were done at the university level. At this stage it is too early to investigate the relationship between students’ GSA scores and their work performance.

  2. In support of test validity, the data collected suggest that student performance on each GSA component is significantly correlated (statistically) both with TER and GPA performance for most university cohorts. In most cases, the GSA-GPA correlation was as good as or better than the TER-GPA correlation. For cases where performance on the GSA did not correlate significantly with GPA, neither did TER.

  3. The predictiveness of the GSA components varied with the university cohorts, and this observation may be related to the field of study composition of the cohorts or other sample idiosyncrasies. Predictive validity studies that are focussed on individual fields of study could be informative.

  4. The fact that performance on a short test of generic skills like GSA correlates significantly with measures like GPA and TER, which are related to a wide range of curriculum knowledge and skills, suggests the importance of generic skills in academic performance and supports GSA validity.

  5. GSA-GPA correlations appear to be comparable to SAT1-GPA correlations seen in the USA.

  6. Although data was only available for a handful of students, a small-scale study using a variant of the GSA (BMAT) tailored for entrance into a postgraduate business school found a statistically significant correlation between GSA Problem Solving performance and GMAT2 performance. It would be desirable to expand this study.

  7. Given the predictiveness of the GSA, it may be feasible for universities to use it as an additional predictor of performance for entry into undergraduate and post-graduate courses, perhaps weighting the GSA components to optimise the predictions (as is done with the Victorian General Achievement Test in another context). The GSA might also be used to provide university entrance score equivalents for students who do not have these.


Foot note 1. The SAT (Scholastic Aptitude Test) is a widely used test of general academic ability in the USA.

Foot note 2.The GMAT (Graduate Management Admissions Test) is used for selection into many postgraduate business courses in the USA and other countries.


top

Evaluation of GSA reference ranges (Chapter 8)
  1. The GSA uses two main methods of indicating student performance on the Student Report Forms. One provides for comparison purposes the middle 60% of all student scores and the middle 60% of scores for students in similar fields of study. The other provides performance level descriptors, with student scores assigned to a level of performance.

  2. Because GSA reference ranges are related to the sample that sat the first two tests, as discussed previously, there is doubt about how well the reference ranges apply to the whole university population.

  3. As suggested by the TERs of participating students, reference ranges may be set high, with stronger students being over-represented.

  4. Reference ranges are likely to be most problematic for the individual fields of study where few students have participated so far and where the field is composed of smaller sub-fields whose students differ markedly.

  5. It should also be noted that, because insufficient data are available, current reference ranges for field of study groups do not take into consideration year level of students, and this is inappropriate.

  6. In consultation with universities, more representative samples should be sought for the purpose of refining reference ranges.

  7. In consultation with universities, further consideration could be given to the suitability of the described levels of performance.

  8. The reliability of GSA multiple-choice components is likely to be satisfactory for many purposes though not for others. For example, it may be satisfactory for measuring relatively small changes in performance for groups of students between university entry and exit. However, it may not be satisfactory for determining small changes in an individual student’s performance. The problem of test reliability would be most acute for the low and high ends of the reporting scale where there are few items to discriminate between students.

  9. For assessments at the low and high ends of the scale, and for other purposes, specialised versions of the test might be used.

  10. In order to improve test reliability for some purposes, it may be appropriate to reduce the number of multiple-choice components from three to two, one focusing on Analysis, Synthesis and Evaluation of information (addressing common elements of Problem Solving and Critical Thinking consistent with the initial stakeholder input) and the other on Interpersonal Understandings.

top

Review of test construct and items (Chapter 9)
  1. Various stakeholders and content experts were asked to evaluate the GSA construct and a sample of items.

  2. In general, the content experts in the various domains commented favourably on the face and content validity of the construct and items. However, they expressed general concerns about whether performance on the test would translate into university and workplace performance, and questioned the extent to which universities deliberately develop GSA-type skills (though such skills are mentioned in most university mission statements). In addition, there were specific concerns, such as those relating to the meaningfulness of some performance level descriptors.

  3. In general, the graduate recruiters seemed to respond positively to the test, suggesting it was relevant. However, when asked for their preferences, they tended to emphasise most workplace skills such as applied interpersonal skills.

  4. In general, the students responded positively to the test overall, suggesting it was measuring important skills and could give useful feedback to students and universities. Nevertheless, some expressed concerns about fairness for students whose first language was not English, as well as querying validity and reliability.

  5. In the discussion with the group of other stakeholders, it was apparent that there were dramatically different views about aspects of the test, and to some extent these views were related to the background of the stakeholder (e.g. humanities academic vs engineering professional). In general, issues of concern for these stakeholders included: the possibility of league tables appearing; whether there are generic skills outside disciplines or work situations; privacy of results; whether universities actually teach such generic skills; limitations of multiple-choice items; relevance of interpersonal skills to researchers; audience specification and scaffolding for writing; relevance of the test to all university students; relevance to post-graduate work; cultural and ESL bias; and so forth.

top

Recommendations for the future development of the GSA (Chapter 10)
  1. Continuing attempts should be made in association with universities to obtain representative student data.

  2. The factor structure of the test should continue to be monitored to ensure that the test remains appropriately focussed.

  3. Further investigations should be undertaken to confirm, clarify and more precisely quantify relationships between performance on the GSA and variables such as field of study and year level, and to investigate the appropriateness of differential performance on the basis of variables such as English-speaking background and gender. Investigations broadening the range of variables examined could be done.

  4. Further investigations should be undertaken into the relationships between GSA performance and markers of achievement at university and work. Evidence could include reports on students and graduate workers by tutors and supervisors.

  5. Consideration could be given to the use of the GSA for selection into university courses.

  6. Reference ranges should be refined, including those for sub-groups, such as specific field of study and year level cohorts.

  7. There should be further evaluations of whether test reliability and described levels of performance are suitable for the particular purposes for which the results are being used. If reliability is not sufficient for a particular purpose, consideration should be given to ways of improving it.

  8. 8 In consultation with stakeholders, consideration should be given to the refinement of face/content validity, and construct and level descriptions, where possible, these being based on a comprehensive and commonly accepted developmental model of generic skills.

  9. The purpose(s) of the test should be clarified in consultation with stakeholders and, if appropriate, versions of the test tailored for specific stakeholder purposes could be produced that are linked statistically to the general test.

  10. Assessment of validity should be ongoing as the test evolves and stakeholders should be involved in evaluation and research.

top

Concluding remarks
  1. The challenge for test developers of producing an appropriate theory-based and empirically validated test of generic skills that satisfies a range of stakeholders with competing demands is a substantial one. In relation to this, more discussion with stakeholders about the purpose, design and value of the test, as well as more opportunity for stakeholder involvement in test design and research, may be useful.

  2. Assessment of the validity of the GSA is a complex process. This study is a first step that provides evidence in favour of the validity of aspects of the GSA as it currently operates, but also raises some concerns. As the GSA evolves in response to feedback, ongoing assessment of validity will be required.

 

Full Copy  PDF Document  (566.60 KB) of Publication