Artemis Kostareli, September 2009
Supervisors: Dr Nick Voulvoulis and Dr Martin Head
As of September 2009, more than 2,950 soil and earthworm surveys had been completed. With such a large dataset, it is important to examine the reliability of the reported data to ensure it is valid and usable.
The aim of this study is to analyse the steps involved in quality control of the soil data submitted by the public to the OPAL website.
- Standardise data collection to make analysis and interpretation of OPAL results more reliable
- Find out the variability of recordings among different people and at different times in order to determine bias-errors
- Assess the OPAL soil and earthworm survey as a whole and make suggestions for improvement
Soil was sampled at two different sites to determine the variability of records among different people and at different times.
The first site was the Natural History Museum (NHM) gardens, where the purpose was to determine the differentiation of data among different users and finally to assess the confidence of the user. The second was Princes Gate Gardens, where two people carried out soil sampling to identify the variability of data through time and to check the confidence of the OPAL survey as a whole.
The methodology for data quality analysis consisted of the following levels:
Level 1: Confidence of location
Level 2: Response rate/frequency of responses
Level 3: Discrepancies in responses
Level 4: Confidence of user through sampling at the NHM Gardens
Level 5: Confidence of survey through sampling at Princes Gate Gardens
Results and conclusion
Substantial conclusions can be drawn from the OPAL dataset and soil sampling:
- The majority of the OPAL soil and earthworm surveys were conducted near population centres which provides information about urban soils and potentially can complement real soil science.
- A high percentage of people (88%) that conducted the surveys appeared to be considerate of the significance of soil and earthworms in the earth’s ecosystem - this meets the goal of DEFRA’s Soil Action Plan (2003) regarding the public awareness of soils.
- The majority (over 80%) of reported coordinates coincide to the range of the reported postcode in the three different intervals of 500, 1000 and 2000 metres. Although there is quite a high rate of not reporting the locations of the surveys (approximately 57% of the entries).
- No statistically significant change in the 2 participants’ records could be detected in Princes Gate Gardens. Slight differences between the records of the two samplers through time occurred due to bias, abiotic factors (rainfall) and random errors.
- The differentiation of the recordings among different people in the NHM Gardens is relatively small (for pH is 0.5 units). Bias and different skills among the participants resulted in a bigger differentiation among the volunteers’ recordings in NHM Gardens than Princes Gate Gardens’ recordings (two participants).
- Soil smell, colour, moisture and texture are quite subjective. Participants seemed to have different perceptions about these soil properties. Regarding the latter, training will enhance the performance of users to determine soil texture.
- The data generated by volunteers corresponded to a range rather than exact numbers or values, which made it difficult to identify potential changes and support final conclusions. For example, soil texture at Princes Gate Gardens appeared to be between silty clay and silty clay loam.
Assessing the OPAL project according to a number of criteria that have been conducted by the Citizen Group of NBII (National Biological Information Infrastructure) in the USA , it seems that the project has met all the requirements that render a citizen science project successful. Nonetheless, training of trial participants didn’t occur, which is also a basic element for the success of a citizen science project. Training of participants puts into consideration the evaluation of the project as a whole.
Determining the main aim of a citizen science project is a significant parameter for assessing the quality of data returned from the public, as it has been stated in many studies. Therefore, data collected by volunteers should be validated in some way. Development of methods and patterns will enhance the reliability of data.
Factors that influence the quality of data include variability in participant's age and background, time spent at the sampling site, skill levels and effort.
Download the full project summary (PDF, 316KB)