WRC Ltd
AARDVARK
AWAardvaWRc Refark Usf: UC757May 2ser Gu 78.01 2011 ide RESTRExterna© WRc pThe contereproducphotocopThis docuAny enqWRc plcFranklanSwindon Aard ReportDate: AuthorProjectProject RICTION: Tal: plc 2011 ents of this doed, stored in pying, recordinument has beequiries relatinc, nd Road, Blan, Wiltshire, S dvark t No.: rs: t Manager:t No.: This reportAaocument are sa retrieval sng or otherwiseen produced bng to this repagrove, SN5 8YF User UCMaIan: Ian091t has the fordvark Liceubject to copyystem or trane, without the by WRc plc. ort should beTelephoFax: + 4WebsiteGuideC7578.01 ay 2011 n Codling n Codling 193-0 ollowing limence holdeyright and all rnsmitted, in aprior written ce referred to one: + 44 (0)44 (0) 1793 8e: www.wrcpe mited distriers rights are reseany form or bconsent of WRthe Project M 1793 86500865001 lc.co.uk bution: erved. No part by any meansRc plc. Manager at t00t of this docums electronic, mthe followingment may be mechanical, address: C1.1.1.1.1.1.1.1.1.2.2.2.2.3.3.3.3.4.4.4.4.4.5.5.5.5.Conten Introd1 Turnin2 The po3 What 4 Quest5 Statist6 What 7 Progre8 Usefu Install1 Install2 Install3 Copy Using 1 The A2 Illustra3 Guide Prepa1 Introd2 The d3 The co4 Troub Outpu1 Printin2 The C3 Expor nts uction ..........ng data into iotential Aardis Aardvark?tions that Aatical understaAardvark caessing beyonl reference ming Aardvarking the Staning the Netwprotection ...Aardvark ....Aardvark menative examplelines for usinaring Data foruction ..........ata file .........ontrol file .....le-shooting .uts from Aardng ................Copy Graph frting data .........................nformation ..dvark user ....? ...................rdvark can aanding via An't do ...........nd Aardvark .material ........k ...................dalone versiwork version .........................................nu system ....es ................ng Aardvark .r Aardvark ...................................................................................dvark ................................acility ...............................................................................................................answer .........Aardvark .......................................................................................on ...................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... 1 ....... 1 ....... 3 ....... 4 ....... 5 ....... 7 ....... 7 ....... 8 ....... 9 ..... 10 ..... 10 ..... 10 ..... 10 ..... 12 ..... 12 ..... 14 ..... 15 ..... 22 ..... 22 ..... 22 ..... 26 ..... 32 ..... 34 ..... 34 ..... 34 ..... 35 AApApAp LiTaTaTaTaTaTaTaTaTa Appendicesppendix A ppendix B ppendix C ist of Tablable 1.1 able 1.2 able 1.3 able 3.1 able 4.1 able 4.2 able 4.3 able B.1 able B.2 s Using AStatisticInstallinles Part of A typicaTypicalThe AaData anData anData anValues than zeMinimuNormabe enteAardvark – Ilcal Tables ...ng Aardvark .an archive ral data summl output fromardvark tool bnd control filend control filend control fileof the correero for variouum values nel probability ertained ........lustrative Ex........................................retrieval datamary ............m a statistical bar – menus es for Field Res for Halderes for Moss elation coeffius numbers oeeded for theplot before t....................xamples .............................................a listing ...........................routine .......and icons ..Raynes STWr Brook .......Keatose Wacient just sigof samples ..e correlationthe Normalit...............................................................................................................................................................W ......................................ter T. W. .....gnificantly gr....................coefficient oty hypothesis..................................................................................................................................................................................................................reater ...................of the s can ........................ 36 ..... 84 ..... 85 ....... 1 ....... 2 ....... 2 ..... 13 ..... 23 ..... 30 ..... 31 ..... 84 ..... 84 WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 11. Introduction 1.1 Turning data into information Routine quality1 data held on a computer archive looks something like this: Table 1.1 Part of an archive retrieval data listing DATE DO BOD NH4-N 18/04/86 12.3 2.0 0.22 20/05/86 8.6* 1.0 0.08 04/06/86 7.1* 1.0 0.08 18/06/86 6.5* 2.0 0.14 03/07/86 7.3* 2.0 0.19 17/07/86 8.3* 2.0 0.04 01/08/86 9.3 2.0 0.29 15/08/86 9.7 2.0 0.12 01/09/86 9.5 2.0 0.08 15/09/86 10.1 2.0 0.06 30/09/86 8.9* 3.0 0.16 14/10/86 10.1 0.09 29/10/86 11.3 3.0 0.04 12/11/86 11.4 2.0 0.22 27/11/86 11.4 2.0 0.13 04/12/86 10.7 2.0 0.13 05/01/87 12.2 3.0 0.15 19/01/87 12.4 3.0 0.17 Not very informative, eh? Trying to pick out any messages from great blocks of data like that is a sure way to bring on a headache. The human eye is simply not very good at that sort of thing. So data by itself isn't much use. We need a way of turning that data into information. Why? Well, what's the point of spending huge sums of money on sampling and analysis if the data isn't going to be fully used? OK, maybe the cost is justified because the data is needed for statutory annual compliance reports. But if that's all you use it for, don't you have the niggling feeling that there must be something more you ought to be doing with the data? For example, 1 We'll be using the general term 'routine quality' to mean the quality of anything covered by a routine monitoring programme. Because of our background in the water industry, many of our examples are taken from rivers, sewage effluents and potable waters. But Aardvark can apply just as well to routine quality data in other industries. WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 2hidden inside your historical data there's almost certain to be evidence of improving and deteriorating trends in quality. Doesn't it make sense to seek out that evidence? A common way of getting information from data is to calculate various summary statistics. You'll all have seen examples like this: Table 1.2 A typical data summary DO BOD AMM.NIT. MINIMUM: 3.80 0.20 0.10 AVERAGE: 9.76 2.37 0.25 ST.DEV: 1.31 2.48 0.39 95PCILE: 11.0 6.46 0.90 MAXIMUM: 14.2 16.0 6.60 NO. RESULTS: 206 217 218 Still not what you might call the last word in information presentation. What is the table really telling us? What's a standard deviation anyway? (besides being a number you're always supposed to put in tables of scientific data...) And things can go from bad to worse if we try to plug into an out-and-out statistical package. What's the average2 reader to make of outputs like this? Table 1.3 Typical output from a statistical routine The analysis of variance for permanganate values at unpolluted stations around Locality 2 Source of variance Sum of squares Degrees of freedom Variance Between months 4998 7 714 F=3.1* Between stations 1494 6 239 Error 6621 29 227 Total 13113 42 312 2 Average: an informal term used by the man in the street to describe any measure of the 'centre' of a data set - like the arithmetic mean, or the median. WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 3Spreadsheets? They're great for simple repetitive calculations and for producing attractive graphs. But they're not always the most suitable means of performing complicated analyses. The existence of a very wide range of options within the package can make it difficult to find the option you want, even if it exists. And, anyway, how quickly can a busy quality manager get up the learning curve for a particular spreadsheet package? As for writing Macros to repeat a particular analysis on other data sets, well, that calls for a professional programmer Are we striking a sympathetic chord? Then read on... 1.2 The potential Aardvark user Before we start telling you what Aardvark is, just look through these few questions and tot up how many times you answer 'yes': Question Yes No Do you have routine quality data that you 'really must get round to analysing when you can find the time'? Does computing for computing's sake leave you cold*? Does working with spreadsheets give you an uneasy feeling that some hidden error might have crept in? When you hear the word 'statistics', do you break out in a cold sweat or feel the eyes glazing over? Is the nearest statistician in your company 50 miles away (metaphorically at least) in Headquarters?3 Have you tried statistical packages and been left wondering what the answers meant? Yes score? 0 or 1 Hmm – well, carry on reading anyway... 2 to 4 Good – Aardvark could be just what you've been looking for. 5 or 6 Excellent – you're the perfect Aardvark user 3 Come to that, does your company even have a statistician? WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 41.3 What is Aardvark? Let's just recap where we've got to: • you do have data that you want to turn into information... • …but you don't have easy access to the specialist skills – both statistical and computing – needed to extract that information. This is where Aardvark comes in. Aardvark - standing for: 'Analyse any routine data; visually acquire real knowledge' - is a data interpretation package designed for non-statisticians. Aardvark doesn't need you to be a statistician because of the statistical planning that's gone into its design. The result is that the statistical machinery contained in Aardvark is low profile and house-trained. It lives quietly in the background; it won't jump from behind a bush and bite you in the leg. How does this vision of paradise actually work in practice? Well, Aardvark has three built-in features that combine to take a lot of the sting out of statistics: 1. Menus tailored to requirements The structure designed into Aardvark means that you'll never be left stranded, like a beginner in front of a chess board, wondering which of a hundred possible moves might be the right one. Sure, you have a range of options. But simply by working through the options in sequence you'll find interesting patterns in your data and quickly learn which options are most useful for your sort of data. In any case, we'll be showing you which options are right for which types of inquiry. And once you're working through a particular option you won't be asked baffling questions. Aardvark leads you through a logical chain of operations. There is the odd place where a technical question is necessary. But even then, Aardvark explains what the alternatives are, and will do something sensible whatever option you go for. 2. Use of colour graphics Next, Aardvark utilises the familiar maxim that 'one picture is worth a thousand words' ? or, in this case, a thousand numbers. Of course, it depends on the picture; not just any old picture will do. (Some business graphics packages, for example, think that the acme of achievement is a three-dimensional pie chart.) The pictures in Aardvark are all specifically tailored to the types of question that you will commonly want to put to your data. They reinforce the underlying statistical analysis so powerfully that the conclusion in most cases leaps out at you from the picture alone. WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 5It couldn't be easier. You simply click on the Tool bar to see the graph of your choice. Often this will provide all the information you require. However, if you wish you can select further options to develop the analysis a stage further. 3. Automatic significance testing Aardvark's third special feature is this: whenever you reach a point where it's appropriate to do a significance test4, Aardvark does one automatically and puts its conclusion on the screen. So this means that you never have to start fumbling around in a set of statistical tables in the middle of an Aardvark session puzzling out what is meant by things like 't = -5.43' or 'F(1,73) = 3.39' (Well, almost never - there are just a couple of places where you may sometimes need to consult Appendix B of the User Guide for the significance level of a correlation coefficient.) 1.4 Questions that Aardvark can answer So when we come down to it, what exactly can Aardvark do for you? Well, these are some of the questions that you'll find easy to answer using Aardvark. We'll just introduce them here; then in Chapter 3 we'll start showing you the practicalities of how you actually get Aardvark to tackle them... Q: Is there a seasonal pattern? It could be important for you to know that there's a greater risk of failing to comply with a particular limit at certain times of year. Information of this sort would give you the option of redirecting your sampling effort to concentrate on just the critical months. Or you may want to know whether it would be useful for a certain sewage effluent consent to incorporate seasonal variations. Or, having found a seasonal pattern, it can be useful to deseasonalise the data to see whether the seasonality is hiding other sorts of trend. Q: Is quality getting better? - or worse? And if there has been a change, was it gradual or sudden? Questions of this sort come a close second in popularity to the perennial 'How many samples should I take?' One approach is to look at how annual averages vary from year to year - indeed, you can do this with Aardvark if you want to. But the snag is that trends don't suddenly start up at year-end. And if they're subtle but persistent, will they show up in crude 4 Significance test: a statistical procedure for determining whether or not an observed effect (a difference between two averages, say) was likely to have arisen simply through chance sampling fluctuations in the data. WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 6annual averages anyway? That's why Aardvark bases its trend analysis on the 'cusum' approach - of which more in Chapter 3... Q: Is quality getting more - or less - variable? This can be an important question for the reason that nearly all trend assessments relate to average quality. So a situation could arise with, say, a sewage effluent where routine monitoring showed that mean performance had remained at an acceptable level, and yet because quality had become more variable the effluent was actually running an increasing risk of failing its consent. Q: Are we doing enough sampling?... or are we doing too much? By itself, Aardvark can't answer these fundamental questions. But what Aardvark can do is give you a good understanding of: • quality variations in the processes you're monitoring, and • the information capabilities of your current monitoring programme. With this key knowledge you're half-way towards deciding sensible, attainable monitoring targets for the future. Again, we'll expand on this important theme in Chapter 3. Q: Are we justified in pooling data for several years? This ties in with the previous question. If you can show that quality has remained stable over, say, the last three years, you'll be in the happy position of being able to quantify current performance more precisely, and contemplate cutting back on future sampling frequencies. So again, the benefits to be gained from a trend analysis are clear. Q: Can we assume that quality follows a Normal5 distribution?... or a log-Normal distribution?... or something else again? Uh, oh, you're thinking: this is where the jargon starts to hit the fan. OK, let's just put it like this for the moment. For most objectives (looking for trends, say, or measuring compliance with standards), you can make more efficient use of your data if you happen to have some idea of which type of mathematical curve might reasonably describe the spread of variations in quality. But, once again, more of that in Chapter 3. 5 Normal distribution: a symmetrical, bell-shaped curve widely used in statistics for describing the pattern of chance variations about a central, average value (e.g. Heights of Female Senior Citizens in Southport). The log-Normal is another distribution; although it's skewed, it's closely related to the Normal distribution. WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 71.5 Statistical understanding via Aardvark All data arising from a routine monitoring programme is a mixture of: • random variation, and • systematic trends or patterns. Statistics, in a nutshell, is the science of peering through the inevitable fog of random scatter and trying to identify which of the apparent systematic effects are genuine - that is, are so large that they are unlikely to have arisen by chance. For example, this year's mean BOD is virtually certain to be different from last year's, so the question is: how big must that difference be before we should start to sit up and take notice? That's what statistics is all about. Why are we telling you all this? Well, does any part of your job involve the use and interpretation of quality data? If so, these basic statistical principles are essential background to you if you don't want to find yourself wasting time and money investigating apparent changes that, in many cases, have no real meaning. And that's where Aardvark brings a handy incidental benefit. After you've been using Aardvark for a while, we think you'll find that - with no sweat on your part - you've acquired a good practical appreciation of the basic aim of statistical methods. Can you imagine this happening if you'd had to sit down with a textbook and a calculator? 1.6 What Aardvark can't do So that we're not accused of bias6 we ought to mention some of the things that Aardvark can't do. There are three main limitations. 1. Range of sampling frequencies First, there's the question of what's meant by the Aard in Aardvark - i.e. 'Analyse any routine data'. Aardvark was designed to cope with typical water industry sampling frequencies - from, say, four to 100 per year. Aardvark could handle daily data at a pinch, though some of the graphs might look a bit cluttered. But it's not intended to be used for frequencies any higher than that. (Though if you were really determined, you could hoodwink Aardvark by coding each sample with a dummy date. So, for example, a weeksworth of hourly data could be submitted to Aardvark as 168 rows of dummy weekly data.) 6 Bias: a statistical term describing a persistent tendency to under-estimate or over-estimate the true value. WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 82. Types of data file Next, Aardvark is limited in the types of data set it can accept. It expects the data to be in one of two formats: • a fixed-format sequential ASCII file (we'll explain the jargon later); or • a comma-separated-value (CSV) ASCII file. So Aardvark cannot, for example, accept data directly from a spreadsheet or a database (though the CSV option makes it easy to export data in Aardvark-ready format from these sources). It's likely, therefore, that you'll need to set up a 'pre-Aardvark' routine which lets you extract the required data and write it to a suitable file ready for later access by Aardvark. In our experience, however, most organisations do already have such routines as part of their standard archive software, so we don't think this will present too serious an obstacle in practice. 3. Range of statistical techniques Aardvark's third main limitation is in the range of statistical tests it can perform. (If you're not statistically minded, skip this next bit) You won't be able to get Aardvark to do ANOVAs, GLMs, multiple regressions - or even t tests. You certainly won't be able to do Box-Jenkins (ARIMA) analyses. (That's getting very specialised; you have to go on a course even to find out how little you understand about it.) But then, if you're interested in 'serious' statistics you'll already be using statistical packages for such analyses; and we certainly aren't trying to reinvent the wheel with Aardvark by offering that sort of statistical capability. We all know that there are lots of excellent statistical packages these days; the only problem, for most people, is in knowing how to use them. So, Aardvark is aimed at the people who don't want to get entangled in specialist statistical software. But even so, we think that statisticians, too, will get a real benefit from the ease with which Aardvark lets you 'get the feel of' a data set. And of course, several features at the heart of Aardvark such as the integrated cusum routine and the deseasonalising option offer analyses that are not always easy to call up at the touch of a few buttons with other software. 1.7 Progressing beyond Aardvark What if Aardvark fills you with such enthusiasm that you want to go on to higher things? Your best move is to have a chat with your friendly local statistician. (There isn't one? Then complain to your boss) Another option would be to give WRc a ring. We could give some general advice, and would always be glad to hear your criticisms, or perhaps ideas for improving or extending Aardvark. WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 9Have a go at reading introductory statistical texts. There are some very good ones, both classics and modern, and we've listed a few titles below. You may also be interested in 'The Sampling Handbook' - WRc Report NS 29. This gives a lot of statistical background to quality monitoring in the UK water industry and it's also crammed with practical examples of the application of Aardvark to real data. Lastly, you could try a pukka statistical package. Nowadays nearly all of them are very easy to use, though - as we keep warning you - they do expect you to know what it is you're wanting to do. But after some time with Aardvark, of course, that isn't likely to be so much of a problem 1.8 Useful reference material BISSELL, A.F. (1984) An introduction to cusum charts. 2nd edn. The Institute of Statisticians, 24 pp. DAVIES, O.L. and GOLDSMITH, P.L. (eds) (1976) Statistical methods in research and production. Fourth revised edition. Published for Imperial Chemical Industries by Longman Group Limited, London and New York. ELLIS, J.C. (1989) Handbook on the design and interpretation of routine monitoring programmes. NS29, Water Research Centre. ELLIS, J.C. and LACEY, R.F. (1980) Sampling: defining the task and planning the scheme. Wat. Pollut. Control, No. 4. HOOKE, R. (1983) How to tell the liars from the statisticians. Marcel Dekker Inc., New York, 173 pp. HUFF, D. (1973) How to lie with statistics. Penguin Books. KENDALL, M.G. (1976) Time series. 2nd edn. Charles Griffin & Company Limited, High Wycombe and London, UK. MORONEY, M.J. (1962) Facts from Figures. 3rd edn. Penguin Books. SIEGEL, S. (1956) Nonparametric statistics for the behavioral sciences. McGraw-Hill Book Company Inc., New York, 312 pp. WARD, R.C., LOFTIS, J. C. and McBRIDE, G. B. (1986) The "data-rich but information-poor" syndrome in water quality monitoring. Environmental Management, 10, No. 3, 291-297. WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 102. Installing Aardvark 2.1 Installing the Standalone version Once you have bought Aardvark, you will receive: 1. Through the post: an Aardvark Installation CD, and the Standalone Aardvark Quickstart guide. 2. By email: your Product Key. (This is a mixture of numbers and letters in the form of A4W_SINGLE-XXXX-XXXX-XX.)7 Step-by-step details of the installation process are set out in the Quickstart guide, and you can find a (slightly edited) copy of this in Section C.1 of Appendix C. 2.2 Installing the Network version Once you have bought Aardvark, you will receive: • Through the post: an Aardvark Installation CD, and the Network Aardvark Quickstart Guide. • By email: your Product Key. (This is a mixture of numbers and letters in the form of A4W_NETWORK-XXXX-XXXX-XX.) (See footnote 7 if you do not have a Product Key). Step-by-step details of the installation process are set out in the Quickstart guide, and you can find a (slightly edited) copy of this in Section C.2 of Appendix C. The installation process for the Network version of Aardvark is necessarily somewhat more complicated than that for the Standalone version. However, as you will need to enlist the help of your computer department to handle the technical details, this is something for them to worry about, not you 2.3 Copy protection Each copy of Aardvark is protected by CopyMinder®. This copy protection system monitors the use of each licensed copy of Aardvark via the internet to ensure that the software is used in accordance with the terms of each individual licence. Standalone licences are aimed at single users of Aardvark, and CopyMinder® will allow one concurrent user of the software 7 The Product Key will have been emailed to a named contact within your organisation when Aardvark was purchased. If, for some reason, you cannot locate the Product Key for your installation of Aardvark, please email aardvark@wrcplc.co.uk to request another copy. WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 11even if it has been installed on more than one PC. CopyMinder® will disable a standalone licence if it is installed too many times. Network licences are aimed at multiple users of Aardvark, and CopyMinder® will allow only the designated number of concurrent users of each licence to use the software at any one time. CopyMinder® offers some additional functionality for network licences to allow the software to be installed and used on portable PCs not connected to the network. These are termed roaming licences, and further information about this is available by emailing aardvark@wrcplc.co.uk. CopyMinder® will also provide copy protection for the software on PCs not connected to the internet. WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 123. Using Aardvark 3.1 The Aardvark menu system Let’s assume that Aardvark is already installed on your PC and that you've fired up Aardvark by double-clicking on the Aardvark icon The first thing you see when you first enter Aardvark - as you do with any Windows software - is a large expanse of blank screen. At the top of the screen are three horizontal bars: from the top, these are the Title bar, the Menu bar, and the Tool bar. Then at the foot of the window there’s the Status bar. The Title bar tells you that you're running Aardvark, and (when you've read it in) the name of the data set you're looking at. The title bar also contains the various control-menu and sizing buttons that appear in all Windows applications. You'll see a similar bar at the top of each of the windows in which Aardvark produces its graphs and tables. The Menu bar contains pull-down menus which let you select various Aardvark functions. The more important of these functions are duplicated by buttons on the Tool bar. The Tool bar contains clusters of buttons that appear or disappear depending on what you are doing at the time. When you click on a particular button, Aardvark will perform the associated function. For example, the leftmost button calls up the Help screens, and the second button from the left is used to open a data file for Aardvark to work on. Table 3.1 lists the function associated with each button. Finally, the Status bar tells you the name of the determinand that you're currently looking at. One final point about Windows. As you probably know, Windows often gives you several ways of doing the same task. For example, to quit Aardvark you can use any of the following procedures: • Select File on the Menu bar and then select Exit in the drop-down menu; • Click the Control-menu button in the top left of the Aardvark Window and then select 'Close'; • Double click the Control-menu button. When we’re describing Aardvark’s functions, it would clearly be very tedious if we kept giving all the alternatives for each function, so we’ve generally standardized on the option of making selections via the Menu bar. But feel free to experiment with other options – and especially the short cuts provided by the Tool bar. WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 13Table 3.1 The Aardvark tool bar – menus and icons Icon Menu command Result File menu Open Data File Opens an existing data file for analysis Open CSV File Opens an existing comma-separated data file Examine data Displays the data in a read-only spreadsheet Quick Print Gives a quick print of the currently selected graph Print Allows you to print up to four graphs on one page Copy Graph Copies the selected graph to the Clipboard Tables menu Overall Summary Gives summary statistics of all determinands Single menu Determinand summary Gives summary statistics and plots for the selected determinand Confidence Summary Gives summary statistics with confidence intervals Time series plot Gives a plot of data value against sample date Histogram Gives a frequency distribution histogram of data Year-on-year plot Overlays yearly plots of data value against month Cusum plot Cusum analysis for identifying changes in quality Normal probability plot Displays a Normal probability plot for determining whether the data follows a Normal distribution InterSample Times plot Gives a display of the number of days between consecutive sample dates Autocorrelations plot Gives a correlogram for sample lags Yearly statistics Gives annual summary statistics for all years Selected yearly statistics Allows you to select years for an annual summary statistics report Pair menu Scatter plot Gives a scatter plot of values for two determinands Double time series (Separate scales) Plots the data values against sample date with the same axis scale for both determinands Double time series (Common scale) Plots the data values against sample date with different axis scales for each determinands Help Menu Contents Opens the help contents screen WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 143.2 Illustrative examples Most people who get their hands on a new piece of software want to try it out at once. (Well, it beats working.) To let you do this with Aardvark we've given you three data sets that you can explore - first in parallel with the lines of enquiry we suggest in Appendix A, and then, if you like, off your own bat. You'll get two benefits from this. First, it lets you put off the minor chore of setting up some of your own data sets (via archive retrievals or whatever). Also, you'll find that you start getting a lot of useful ideas about how you might look at your own data when you do get it on your PC. The data sets we've provided along with Aardvark are: • Field Raynes Sewage Treatment Works (in file FIELD.DAT); • Halder Brook at Carousel Lane (in file HALDER.DAT); and • Moss Keatose Water Treatment Works (in file MOSS.DAT). These are all genuine examples of routine quality data; only the names have been changed to protect the innocent. The data sets are getting a bit old now, but this is no drawback because the principles of analysis apply to data of any age. We did think of changing the dates to make the data appear more recent, but decided against it on the grounds that this would lose links with real historical events such as droughts and strikes. For each data set in turn, we've conducted a typical Aardvark investigation that you can follow, step by step, on your own machine. Of the three investigations, our exploration of the Field Raynes Sewage Treatment Works data is the most extensive, and this takes up Sections A.1 to A.4 of Appendix A. The Halder Brook investigation is covered by Sections A.5 and A.6. Finally we have a look at the Moss Keatose data in Section A.7. Layout This is how we've laid out the examples: ? Instructions telling you what to do next at each stage will be shifted to the right like this and printed in Arial font. Items to be selected from menus are indicated by a change to bold, as in the example: 'Select Exit from the File menu'. Buttons or keys are indicated in bold and by underlining as in ‘Click OK’. Whenever we want to explain a particular graphical display, we've put the description inside a box like this: WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 15Screen Number: Name of Plot There will generally be a bitmap of the graph in the box. Where we have other more general comments – like discussing which option we might explore next, maybe, or 'talking you through' a particular sequence of responses – these appear as free-standing blocks of text like this paragraph. The three sets of enquiries will give you a good insight into: • the types of question that Aardvark can help you to answer; and • the practicalities of how you actually get Aardvark to address them. Now go and try them... 3.3 Guidelines for using Aardvark When you've been through a few of the examples in Appendix A you'll find that you quickly get a good idea of the sorts of things that Aardvark can do. So we're ending the chapter by gathering together that experience in a more structured way. Below, we've listed nine common questions that you might want to ask of your data. Then under each question we've: • outlined the main Aardvark routines that will be of help; and • discussed the key points to look for in interpreting those Aardvark screens and so arriving at your answer. Naturally we can't cover everything in this introductory chapter. So don't forget that Aardvark's full range of functions is described in the Help text. (And of course you'll discover more features for yourself as you get to know Aardvark better.) But for the moment, the questions discussed below will give you a sound working knowledge of how Aardvark can help you 'visually acquire real knowledge'. OK, then; let's assume that: • you're in Aardvark; • you've called up the data set you need; and • you've selected the determinand you're interested in. WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 16We'll also assume that you've windowed in on a suitable time span - by using the Restrict by Date option if necessary. Q1: Is there any seasonal pattern? 1. Look at a Year-on-Year plot. If this is a complete jumble, you've evidently got little or no regularly repeating seasonality. But if there's a persistent tendency for the data to be higher or lower in some parts of the year than others, then you do have a seasonal component. 2. If you think you spot seasonality, try fitting Aardvark's seasonal model. If it looks to be a good model, deseasonalise the data before going on to look for other patterns. 3. As you look at the data points, you may see that some years are appreciably higher or lower than others. If this effect is sufficiently strong to spoil the 'tightness' of the seasonal pattern, try extracting the long-term trend via the Cusum routine, saving the residuals, and then redoing the Year-on-Year plot using the residuals. (Try to avoid picking out the seasonality in the Cusum, though.) 4. You might also look at an Autocorrelation plot for more quantitative evidence – though you do need the data to be fairly evenly spread through time for this to be useful. A regularly repeating wave pattern with a 'sensible' periodicity (e.g. annual) is a good indication of seasonality. Q2: Is quality getting better? ? or worse? 1. Have a general look at the data by calling up a Determinand Summary. If the time series is noticeably spiky, or the histogram is appreciably skewed to the right, it's probably a good idea to log-transform the data before proceeding. (Use the Histogram routine to check out that idea if you like ? though remember that it's the behaviour of the data after removal of trends that we hope is going to be log-Normally distributed.) Also look at the SDD value; the smaller this is in relation to the standard deviation, the greater is the non-random component of variation lurking in the data waiting to be discovered. 2. Have a quick look at the Time Series plot. The amount of variation shown by the annual means will give you a good feel for whether it's worth proceeding to a cusum analysis. (It usually is.) 3. Call up the Cusum routine. Don't waste time fitting too elaborate a model the first time around; just go for the most obvious hinges. Then if that simple model is statistically significant, you can always explore more complex variants in subsequent cusum runs. 4. Remember, don't rely too much on the automated cusum fitting routine. Although it usually finds the obvious hinges, it's not fool-proof. For example, it sometimes fails to find obtuse zigzags, i.e. pairs of hinges with opposing, but not very sharp, slope changes. So, do try fitting some hinges manually. It helps in getting an understanding of cusums. If you get it wrong, you can always click the Reset button. WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 17Q3: Are we justified in pooling data over several years? 1. If, using the approaches outlined above, you've found no evidence of either long-term trend or seasonality, you're justified in pooling data over as many years as you like (within reason). • Advantages... Your summary statistics – means, 95%iles, etc will be more precise than if you had looked just at the most recent year. Over four years, for example, they'll be roughly twice as precise. • Disadvantages... There almost certainly will be differences between years; it's just that they aren't detectable at your present sampling frequency. So when pooling, keep in mind the inherent limitations of your current monitoring programme. (For more on this, see the discussion of Q9...) 2. If you've found no significant long-term trend but have detected a seasonal component, you're still justified in pooling several years' data provided you're only interested in statistics on a year-wide basis ? annual mean load, for example, or annual % compliance with a standard. Q4: What statistical distribution does the determinand follow? 1. The Histogram routine lets you fit and test the 'goodness of fit' of the three commonly used distributions, namely: • the Normal distribution; • the log-Normal distribution; and • the shifted log-Normal distribution. For background information on these distributions and how to interpret the significance tests, see the Aardvark Help screens. 2. Why should you want to fit a distribution anyway? Well, the main reason is that many of the common statistical tests (e.g. t tests, F tests) assume that the data has come from an underlying Normal distribution. For example, the method used by Aardvark for testing the significance of changes in slope in a cusum assumes that the data is at least roughly Normally distributed about the local mean. So if you want to make use of these sorts of test, whether for analysing past data or in planning future sampling??you need to have some idea of the types of probability distribution that various determinands are likely to follow. (Though, to be honest, absence of any such knowledge has rarely stopped people in the past) 3. You'll often encounter data sets whose histograms bear no resemblance to any of these three standard families. This is quite 'normal'; do not adjust your set. One of the secondary functions of WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 18Aardvark, in fact, is to provide users with an easy method by which they can test out various theories that do get put forward about the distributions habitually followed by various determinands. 4. In cases where the histogram does seem markedly non-standard, look at it in the context of the temporal structure of the data by using the 'sideways histogram' option of the Time Series routine. The more seasonality or long-term trend there is in the data, the less likely it is that the overall data set will follow any particular commonly recognisable theoretical shape. If the dominant component should turn out to be long-term trend, it can be useful to see whether trend-corrected quality is nearer to being Normally distributed. You can do this by running the Cusum routine, saving the residuals (that is, the deviations from the trend model), and then running the Histogram routine on the residuals. It would also be sensible to try a similar approach when the main systematic component turned out to be seasonality. In this case, deseasonalise the data and run the Histogram routine on the new deseasonalised determinand. Q5: Are we complying with a percentile limit? 1. Use the Time Series routine with the 'values above limit' option to see a graphical demonstration of how quality has performed against the limit over the years. Often, you'll see a trend in compliance performance which is of course hidden by the overall, or pooled, performance figure (see the red tail of the sideways histogram). 2. Instead of looking at percentage compliance with the standard, you may want to estimate percentile quality for yourself and compare that number with the limit. To do that, first use a Time Series plot to decide over what period it is sensible to pool the data. For example, if quality was much worse three years ago, it would be foolish to include data for that period in a current assessment. So if necessary, use the Restrict by Date option to window in on a suitable subset of the data. 3. Then you have three choices: • Use the Histogram routine with the 'show the percentiles' option to see if any of the three standard distributions provides an acceptable fit. If it does, you can read off the percentile from the screen. • Alternatively, you can call up a Determinand Summary and select parametric percentile estimates for the Normal or Log-Normal distribution. • Finally -? if the Normal or Log-Normal approach seems unsatisfactory - ? you can call up a Determinand Summary and use the non-parametric estimate given there. This is the method we recommend in cases of doubt. Q6: Is quality getting more ? or less ? variable? 1. Aardvark has a special feature specifically designed to help answer this question. Select Transform Single Determinand from the Data menu and select the 'SQRT(range of consecutive pairs)' from the list of transformations. Apply this to your determinand Z to generate a new determinand – NewZ, WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 19say. (If at this point you sneak a look at NewZ by calling up a Time Series, don't be put off by its alarmingly spiky appearance.) 2. The smart thing about this dodge is that a change in the variability of Z will result in a corresponding change in the mean level of NewZ. So now you simply do a Cusum analysis on NewZ; and if this picks up any changes in mean levels, you can deduce that the variability in Z changed significantly at around the same times. 3. If you want to quantify those changes, you'll need to run a Cusum analysis on the original determinand Z, putting in dummy hinges at the points you earlier identified in the NewZ cusum. The summary statistics table at the end of the routine will then automatically give you the relevant standard deviations. Q7: Are two determinands correlated? 1. First, of course, you'll have to select the two determinands you're interested in. (If you intend to fit a regression model, make sure that you select your 'Y' before your 'X' determinand.) 2. For a general look at the two determinands, run the Double Time Series (Common Scale) routine. (This can be particularly informative for sewage effluent BOD and SS data; SS values are generally only a little higher than the BODs, and so you tend to see a tramlines effect.) Where the scales of the two determinands are too dissimilar, you'll need to call up the Double Time Series (Separate Scales) plot instead. 3. Next, ask for a Scatter Plot. Then, if you're specifically interested in the degree of linear association between the determinands, select the Regression option. This will give you the correlation coefficient and other relevant statistics, as well as the regression line itself. Because of the dangers of misinterpretation, Aardvark shrinks from telling you whether or not the fit is significantly better than could arise through chance (though we do give a table of test values in Appendix A). In our opinion, the human eye is usually a much better judge than a formal significance test as to whether a regression line is of any practical value. 4. You can use these same Aardvark routines to look not just at correlations between two determinands, but at how the same determinand varies: • over a fixed time interval – e.g. this week v. last week. (To do this you would first need to generate new 'lagged' determinands; you'll find that these can easily be set up with the help of the relevant Transformation option.) • between two locations (e.g. raw v. final quality). (You would, however, need to arrange for the data for both locations to be held on the same computer file – as, for example, in MOSS.DAT.) WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 20Q8: Has there been a linear trend over a particular time period? 1. First we issue a General Health Warning A statistically significant linear regression of quality against time is no proof that the true underlying trend over the period analysed is in fact linear. For example, a significant result can be triggered by a step change, or a quadratic trend, or even a single mischievously-located outlier. So do bear that in mind whenever looking for linear trends. 2. OK, then; to begin with, it's wise to look at a Time Series plot to check that there is some evidence of a linear trend. Then you may need to use the Restrict by Date function to window in on the required time period. Alternatively, you may discover one or two outliers obviously distorting the picture; you can temporarily remove these using the Restrict by Value function. 3. Next you'll need to generate a 'time' determinand. Go into the Data menu and select the 'Create a Determinand from Sample Dates' option. Let's suppose that you give the resulting determinand the name: TimeX. 4. Now you can call up the Scatter Plot routine from the Pair menu. You'll need to have selected two determinands; specify the quality determinand of interest first, and then TimeX. Aardvark will draw a (somewhat foreshortened) time series, and you can then test for a linear trend by asking for the regression option. Q9: Are we doing too much sampling? ? or not enough? These questions ? and their close cousin, "How many samples should we take?" ? crop up more than any other. Sorry to disappoint you, but we can't answer them here What we can do, though, is to suggest a few ways in which Aardvark can give you a clearer understanding of the capabilities of your present monitoring programme as far as trend detection is concerned. And when you've got a good idea of the sort of trend objective that the present system can meet, you're half-way to deciding what it should try to achieve in the future. 1. Exploiting a failure to detect past trends Suppose that you've carried out a Cusum analysis for a particular determinand and found no trend over the last few years. Use the Yearly Statistics function to generate a table of annual means for that determinand, and look at how much the means vary from one year to another. Although these variations indicate apparent changes in quality, the Cusum analysis has told you that you're not justified in regarding those changes are real; they may well be due simply to chance sampling error. Your reaction to this may be to shrug your shoulders and say "So what?". If so, that's a sign that you may well be doing enough ? or even too much ? sampling. But what if you're taken aback at how large some of the changes are? The message then is that you may be failing to detect changes in mean quality that would be of concern; and that is a clear signal that you should perhaps be doing more sampling. WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 21 Yes, we know what you're thinking: "There go the statisticians again, always droning on about how we need to take more samples". Not so The opposite situation can often arise... 2. Exploiting information about past trends You may have done a Cusum analysis and identified all manner of step changes in mean quality which you know are of no practical significance at all. The message then is that you have a better capability for trend detection than is necessary, and so it's worth considering whether you really do need so high a sampling frequency. (And by the way, 'historical precedent' is, in itself, a rather feeble justification for maintaining the status quo; so don't feel obliged to be constrained by that well-worn strait-jacket.) 3. Exploiting autocorrelations through time Where the sampling frequency has historically been high – say weekly or greater – lagged Scatter Plots can give you a useful insight into whether such intensity of monitoring is really needed. Suppose you find that a plot of this week's quality v. last week's quality has a very high correlation, and closely hugs the 'Y = X' line. In that case, if this week can be predicted from last week so well, why not drop back to fortnightly? If, on the other hand, the correlation is poor, it is then worth enquiring whether anyone takes any sort of control action on the basis of each week's figures. And if they do, maybe the present frequency is actually not high enough 4. Exploiting correlations across a system Where quality is measured at two points along a system (as, for example, with raw and final water), again Aardvark can often highlight opportunities for cutting down on sampling. Taking as your two 'determinands' the same determinand at points A and B, look at them in a Double Time Series plot; and then look at them in a Scatter plot using the 'y = x' option. You may find that quality is so similar at the two points that one of the sampling points is unnecessary. Or maybe quality is always better at the second point, and nobody ever uses the first set of data for anything anyway. Depending on the circumstances, all sorts of possibilities may present themselves. Always, however, the trick is to present the data in such a way that those ideas start to germinate; and that is where Aardvark can be such a valuable catalyst. WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 224. Preparing Data for Aardvark 4.1 Introduction This chapter describes how you set up the data for an Aardvark investigation. Aardvark requires two input files: • the data file itself; and • a control file, telling Aardvark everything it needs to know about the layout of the data file. There are two permitted formats for Aardvark files: fixed-format ASCII, and comma-separated-value (CSV) ASCII files. We describe the make-up of data and control files in the following two sections. First, however, we'll discuss briefly the route by which we think you're most likely to acquire your data. Archive retrievals Virtually all organisations which collect routine quality data store the observed values on some form of computer archive. All archive systems offer some data analysis facilities. For extensive investigations, however, a common arrangement is for the user to request an 'archive retrieval', whereby data for specified determinands within specified date limits is extracted from the archive and written to a separate computer file. You’ll therefore to have access to a routine procedure by which data can be down-loaded from the archive computer to your own PC. Standard software generally exists for this purpose, and if you haven't already established a link of this sort for other applications, your computer archive staff will certainly be able to help you. It's not obligatory, of course, for your data to have been produced by an archive retrieval But we do assume that you have prepared your data file by some means or another, because Aardvark provides no facilities for direct data input. 4.2 The data file Name of data file You may use any three-letter combination for the file extension (apart from .CTL or .CSC, which are reserved for control file names). However, it’s sensible to follow the usual convention of ‘.DAT’ for fixed-format and ‘CSV’ for comma-separated files. WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 23The root name of the data file may be any combination of letters and numbers. It may contain any number of characters (within reason). However, it is not good practice to have absurdly long file names - not least because it clogs up the file listings in Windows Explorer. It is even more foolhardy to include blanks in the root name (a particularly silly feature allowed by Windows), as this almost invariably leads to problems. The data set illustrated (in part) in Table 4.1 is called FIELD.DAT. Other examples of valid names are AVON8088.DAT, Finnbarr0091.CSV, and CamAtKingsBridge.DAT Table 4.1 Data and control files for Field Raynes STW Portion of data file FIELD.DAT Field Raynes Sewage Treatment Works S.S(105) BOD(ATU) Amm Nit Data provided by Herbert Wardrobe of Central WA on 9-v-86 Sent on mag.tape in format WQ17/32b (Option IV) 09/01/73 1330 ---T 21 KL003 28.000 3.200 14/02/73 1215 ---T 22 IL025 32.500 7.000 01/03/73 1200 ---T 22 NC080 57.000 8.100 02/04/73 1100 ---T 22 NC105 52.000 6.200 03/05/73 1440 ---T 22 JC062 59.600 6.300 19/06/73 1430 ---T 11 MC065 23.200 2.400 03/07/73 1315 ---T 11 MC086 22.800 3.000 30/07/73 1015 ---T 11 NC199 29.000 0.900 05/09/73 1345 ---T 11 JC124 18.000 1.800 : : : 27/11/84 1200 D--T 11 PW592 22.000 12.800 2.700 01/04/85 1445 ---D 522 00000 56.000 20.000 4.500 15/05/85 1715 ---D 522 00000 36.000 18.400 1.600 11/06/85 1445 ---D 521 00000 25.000 11.000 3.400 04/07/85 0920 ---D 522 00000 33.000 9.000 4.100 13/08/85 0920 ---D 521 00000 20.000 6.000 2.300 11/09/85 1010 ---D 521 00000 17.000 8.600 2.400 16/10/85 1420 ---D 521 00000 21.000 7.300 2.200 28/10/85 1400 ---D 521 00000 19.000 10.000 6.600 28/11/85 1410 ---D 521 00000 24.000 8.500 7.500 21/01/86 0001 ---D 521 00000 25.000 11.500 1.900 06/02/86 0930 ---D 521 00000 42.000 11.600 0.900 03/03/86 1015 ---D 521 00000 26.000 23.500 10.300 19/03/86 0845 ---D 521 00000 40.000 16.000 1.700 999. 999. 999. Control file FIELD.CTL Field Raynes Sewage Treatment Works 3 S.S(105) BOD(ATU) Amm Nit 4 1 2 3 999 0 -1 .5 (1X,I2,1X,I2,1X,I2, 20X, 3F10.0) WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 24Structure of data file At the start of the data file you may have any number of rows of text information - title of the data set, determinand names, units of measurements, comments, whatever you wish. In FIELD.DAT, for example, there are four rows of text: the data set title, the determinand names, and then two rows of comments. After this initial header information, the file must contain a number of 'fields', or 'vertical blocks', of data – as illustrated in Table 4.1. Of these, • Fields 1, 2 and 3 contain the sample dates (day, month & year in any specified order) moving forward through time (i.e. the file must be in date sequence); and then • Fields 4, 5, 6, etc contain the data – one field per determinand. Years may be defined as either two-digit or four-digit numbers. If presented with two-digit numbers, Aardvark will assume that the four-digit year lies in the interval 1950 to 2049, and so will correctly interpret “00” as being the year 2000. If the file should happen to contain other fields that you aren't interested in, that's OK; you can simply tell Aardvark to ignore them. In FIELD.DAT, for example, there's a lot of clutter that we'll be wanting to skip over between the date and the first determinand of interest (see the " 1330 ---T 21 KL003" sequence in row 5 of Table 4.1). Format of data file As we’ve already mentioned, Aardvark requires the fields of interest to be in either fixed or comma-separated format. We’ll get on to the details shortly when we describe the control file; but we’ll just mention here that the word ‘fixed’ is a bit misleading – if you do choose the ‘fixed format’ option you still have plenty of choice over the width and spacing of the data fields. Size of data file The data file may contain any number of determinands, subject to a maximum file width of 256 characters. Up to 50 determinands can be read by Aardvark in any one call of the 'Select a Data File' option. Once the data set has been read in, Aardvark actually allows you up to 100 determinands (with a few of these reserved for internal use), so you have plenty of scope for generating new determinands such as logged or date-restricted versions of the input determinands. WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 25Each row of the data file (after the header text) refers to a particular sampling date. Aardvark will accept up to 1000 rows of data – that is, data for a maximum of 1000 samples8. End of data file You have the choice of: • letting Aardvark detect the end of the data file automatically; or • using a row of dummy determinand values ("999.", for example) to flag the end of the data file. Missing values Gaps in the data file, known as 'missing values', most commonly arise because a particular sample was not analysed for the full set of determinands. You have the choice of representing missing values by: • a zero; • any other convenient numerical value ? for example, "-99"; • an asterisk; or • a blank. Less-than values Some of your data values may be reported as being less than the analytical limit of detection. You have the choice of flagging these in the data file by: • the symbol "<" (e.g. "<0.05"); or • negative numbers (so, for example, "-0.05" would stand for ("less than 0.05"). Greater-than values In data retrieved from a quality archive, data values may occasionally be flagged by the symbol “>”, indicating that they are greater than the analytical limit of detection. As there is no standard statistical procedure for adjusting the data in such cases, Aardvark simply ignores the “>” sign and accepts the data at face value. 8 If you have more data than this, it may well be that you should be thinking of using something other than Aardvark. With a large quantity of 15-minute or hourly data, for example, it wouldn’t be sensible to use Aardvark as it has no built-in functions for visualising and quantifying diurnal variation. But if you should be interested in medium or long-term trends as well as short-term variation, a useful trick is to do some simple pre-processing of the high-frequency data to generate, say, weekly or twice-weekly averages, and then put that data set into Aardvark. WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 264.3 The control file The control file contains all the supplementary information that Aardvark needs to make sense of the data file. The control file is constructed slightly differently depending on whether your data file is in fixed or comma-separated format: this affects (a) its extension name, and (b) what you put in the final row of the file. In all other respects, however, the two files are identical, and the illustrations that we give below using CTL examples apply equally to CSC files. Name of control file The control file must have the same root name as the corresponding data file, and be stored in the same folder. The control file extension should be: • 'CTL' when relating to fixed-format data files; and • ‘CSC’ when relating to comma-separated data files. For example, a fixed-format data file called \\\\KING\\\\PENGWYN.DAT would need to have an associated control file called PENGWYN.CTL stored in the \\\\KING\\\\ folder. For a CSV data file called \\\\Ewe\\\\Piloch.CSV, the associated control file should be called Piloch.CSC and be stored in the \\\\Ewe\\\\ folder. Structure of control file It is a great deal more tedious to read through a detailed description of the composition of the control file than it is to set one up in practice – so don't be put off by the apparent complications In working through the following descriptions, you'll find it helpful to refer at each stage to the FIELD.CTL example in Table 4.1. WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 27Row 1: Title of data set – may be up to 40 characters long. (This will appear at the top of every Aardvark screen.) Row 2: Number of determinands (nD) to be read from the data file. Aardvark allows you to read in up to 75 determinands (though the data file itself may contain a greater number). Next nD rows: Determinand titles – one title per row. Titles can contain any number of characters, within reason. However, keeping to a maximum of around 12 (the old Aardvark limit) is sensible, or otherwise the graph and table labels can cluttered. Following row: Eight 'data control' quantities, separated by spaces, by which you give detailed instructions about the various conventions that you wish Aardvark to adopt in reading the data file. We define these below, in the section called 'Data Control row of control file'. Final row: The format by which Aardvark is to read the required data from the data file. We explain below how this is built up, in the section called 'Data Format row of control file'. Data Control row of control file The last-but-one row of the control file contains eight control quantities. These are defined as follows: 1. Number of rows in the data file to be skipped before the data starts. 2. Number of field containing Day (1, 2 or 3). 3. Number of field containing Month (1, 2 or 3). 4. Number of field containing Year (1, 2 or 3). 5. The end-of-file indicator. This may be either - "0" for automatic end-of-file detection; or - any convenient non-zero quantity (e.g. "98789.") by which the end of the file is signalled; this must appear in every determinand field of the data file. 6. The missing values (MVs) indicator, defined as follows: - "0" if MVs are represented by zero or blank in the data file; - "1" if MVs are represented by "*" in the data file; or - any convenient non-zero quantity (e.g. "-99") by which you have flagged MVs in the data file. 7. The less-than indicator. You have the choice of three possible values: - "0" if there are no less-than values in the data file; - "-1" if you want Aardvark to interpret all negative data values as less-than values; or - "-2" if the symbol "<" is used to represent less-than values in the data WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 288. The factor by which you wish to multiply the recorded less-than values. For example: - "0.5" would cause "less-than 0.02" to be replaced by 0.01; - "0" would replace all less-than values by zero. (This, in conjunction with a "0" choice for the MV indicator, would then allow you to ignore all less-than values.) So to take the example of FIELD.CTL, the Data Control row tells Aardvark that: • the first 4 rows of the data file are to be skipped; • the day, month and year values are in fields 1, 2 and 3; • the end of file is flagged by a row of 999s (you don't need to include the decimal point); • missing values appear as zeros or blanks; • less-than values (if any) will be flagged by negative entries; • where less-than values do occur, they should be replaced by 0.5 times the limit value. Have a look now at the control files for the other two data sets. Particular points to notice in MOSS.CTL (Table 4.3) are the automatic end-of-file detection; the order of the Day, Month and Year fields (3, 2, 1); and the use of "<" to indicate less-than values. Data format row of CTL control file This is how to proceed in cases where you have a fixed-format data file. In the final row of the control file you specify the format by which Aardvark is to read the required data from the data file. The characters '(' and ')' must always be the first and last to appear in the format statement, and may not be used elsewhere in the statement. The rules governing what goes on inside the brackets are broadly the same as those applying to format statements in the computer language Fortran. The simplest way to explain this is by looking at the three examples shown in Tables 4.1, 4.2 and 4.3. First let's look at the data format row in FIELD.CTL The format specified for FIELD.DAT is: (1X,I2,1X,I2,1X,I2, 20X, 3F10.0) This is shorthand for: 1X skip the first column; I2 read a 2-digit Integer; 1X skip a column (so as to jump over the "/" symbol); I2 read a 2-digit Integer; 1X skip another column; I2 read a 2-digit Integer; 20X skip 20 columns (so as to jump over various unwanted fields on the file); WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 293F10.0 read 3 data values each occupying a field-width of 10 columns (the decimal point is handled automatically when Aardvark reads in the data). Now let's look at the Data Format row for the control file HALDER.CTL shown in Table 4.2, namely: (I2,1X,I2,1X,I2,2X,F10.0,2X,F10.0,2X,F10.0) Though broadly similar to the FIELD.CTL format, this differs in one noteworthy respect. The water undertaking which provided this particular data set had the convention of flagging unacceptable determinand values with an adjacent '*'. In the portion of the data file shown, for example, we can see two sequences of unacceptably low DO concentrations. To skip round all such flags wherever they may occur, we've specified the format for each determinand as '2X,F10.0' meaning 'skip two columns, then read a field of width 10' – rather than as F12.0. Finally, let's look at the Data Format shown in Table 4.3 for MOSS.DAT. This is slightly more complicated because there are more determinands (viz 20) on the data file, and also because we've chosen not to read two of them (Raw and Final PV). So: the format is: (1X,I2,2I3,1X,F3.0,5F6.0,6X,9F6.0,6X,3F6.0) and this is shorthand for: 1X skip the first column; I2 read a 2-digit Integer; 2I3 read 2 Integers from consecutive fields of width 3; 1X skip one column; F3.0 read one data value occupying a field-width of 3; 5F6.0 read 5 data values each of field-width 6; 6X skip the next 6 columns; 9F6.0 read the next 9 data values each of field-width 6; 6X skip the next 6 columns; 3F6.0 read the final 3 data values each of field-width 6. You may have noticed the string of less-than values occurring for the determinand 'nitrite in the final water'. Because these are flagged by the "<" symbol, we specified the "-2" option for the less-than indicator in the control file. Aardvark is therefore on the look-out for "<" signs dotted about in the data file, and so it's perfectly OK for the "<" symbol to appear within the specified field-width of a determinand. (We mention this point in case you happen to be familiar with Fortran, and are wondering how something that looks very much like a standard Fortran format statement can cope with such things. Aardvark's format specification did indeed originate as pure Fortran; but we decided to WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 30make it more 'intelligent' in the way it can recognise and interpret certain characters like "<" and "*".) Data format row of CSC control file In contrast to the previous section, the way to proceed in cases where you have a comma-separated data file is extremely simple. All you need in the final row of the CSC file is a series of 1’s and 0’s, separated by commas, containing one value for every field in the data file. The digits ‘1’ and ‘0’ denote ‘read this field’ and ‘skip this field’ respectively. So there needs to be a ‘1’ corresponding to each determinand you want to include. You also need to have three 1’s picking out the Day, Month and Year fields for the date (in whatever order you’ve specified in the previous row of the CSC file). And, as with the CTL file, these fields must come to the left of any of the determinands. So: the total number of 1’s will equal nD + 3, where nD is the required number of determinands. A good way to see for yourself exactly what a valid CSC file looks like is to have a go at saving a data file from within Aardvark, as this creates both a CTL and a CSC file automatically. We’ll be describing this feature shortly in Section 5.3. Table 4.2 Data and control files for Halder Brook Portion of data file HALDER.DAT Halder Brook at Carousel Lane CHAR: DO BOD NH4-N 04/04/73 11.2 2.3 0.43 10/05/73 10.4 1.1 0.05 08/06/73 5.0* 2.0 0.13 09/07/73 5.7* 3.0 0.07 07/08/73 8.4* 3.0 0.30 05/09/73 6.9* 1.0 0.16 04/10/73 8.3* 1.0 0.12 02/11/73 9.4 1.8 0.07 03/12/73 11.9 1.5 0.21 08/01/74 12.1 1.9 0.30 06/02/74 12.4 2.0 0.23 07/03/74 11.5 1.0 0.33 05/04/74 9.9 2.8 0.07 : : : 02/04/87 11.5 2.0 0.04 23/04/87 9.8 4.0 0.05 08/05/87 9.4 1.0 0.04 22/05/87 9.1 1.0 0.04 08/06/87 9.9 1.0 0.10 22/06/87 7.0* 4.0 0.08 WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 3107/07/87 7.2* 1.0 0.07 21/07/87 8.8* 2.0 0.09 05/08/87 7.9* 1.0 0.04 19/08/87 8.6* 5.0 0.04 31/12/87 999. 999. 999. Control file HALDER.CTL Halder Brook at Carousel Lane 3 DO BOD NH4-N 2 1 2 3 999 0 -1 .5 (I2,1X,I2,1X,I2,2X,F10.0,2X,F10.0,2X,F10.0) Table 4.3 Data and control files for Moss Keatose Water T. W. Portion of data file MOSS.DAT Moss Keatose Water Treatment Works First ten detds relate to Raw Water: Condtvty, pH, Alkalnty, Second ten are corresponding Final Water Detds 81 1 5 740 8.31 196 .150 4.20 .024 2.00 62 1.20 10.84 81 1 12 760 8.14 196 .120 4.20 .012 1.80 61 1.03 11.00 81 1 19 760 8.34 196 .170 4.48 .011 2.00 62 1.21 11.39 : : : 85 12 16 715 8.29 217 .128 2.20 .039 2.72 66 1.74 11.70 85 12 23 730 8.36 218 .089 3.10 .037 2.36 67 1.42 11.30 85 12 30 720 8.29 216 .106 4.50 .035 2.16 65 1.85 11.90 Amm_N, Nitrate, Nitrite, P.V., Chloride, Phsphate, Silicate .755 8.03 194 .050 4.20 <.001 1.40 62 1.15 10.56 .760 7.96 192 <.010 4.10 <.001 1.24 61 .96 10.80 .770 8.19 196 .092 4.48 <.001 1.40 62 1.16 10.92 .: .: .: .710 7.91 214 .011 2.50 <.001 1.84 66 1.86 11.30 .725 7.99 214 .001 3.30 <.001 1.72 66 1.62 10.80 .730 8.06 215 .008 5.10 .002 1.68 66 2.02 11.60 WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 32Control file MOSS.CTL Moss Keatose Water Treatment Works 18 Raw Condtvty Raw pH Raw Alkalnty Raw Amm_N Raw Nitrate Raw Nitrite Raw Chloride Raw Phsphate Raw Silicate Fnl Condtvty Fnl pH Fnl Alkalnty Fnl Amm_N Fnl Nitrate Fnl Nitrite Fnl Chloride Fnl Phsphate Fnl Silicate 3 3 2 1 0 1 -2 0.5 (1X,I2,2I3,1X, F3.0,5F6.0,6X,9F6.0,6X,3F6.0) 4.4 Trouble-shooting We do have to admit that there's a possibility of making a mistake when setting up a control file So if your first attempts produce nothing more than a disagreeable error message when you try running Aardvark, don't worry. It will almost certainly be something simple that you've not got quite right. So don't fly into a panic – just work methodically through the following checklist: 1. First try running Aardvark with one of the three example data sets, just to confirm that it’s been installed correctly, and that it hasn't itself developed a bug. 2. Next, look at each item in your control file to check that you haven't made a typographical slip. In particular, check that • there’s at least one space between every item in the data control row; • (for CTL files) there’s exactly one opening bracket and one closing bracket in the data format row; or • (for CSC files) the 1’s in the data format row add up to three more than the number of determinands you specified; and • the number of determinands you specified agrees with the number of determinand titles you provided. WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 333. If nothing obvious seems to be wrong, look more closely at your specification of the Data Format row of the control file and make sure it agrees with the columns of the data file; it's easy to get 'one out' when counting. 4. Still no joy? Well, go and have a cup of coffee at this point to calm your nerves. Then, perhaps, you might have another look through the three examples we gave in the previous section; it's just possible that you've misunderstood something. 5. If you're still in trouble at this point, we're beginning to run out of ideas for self-help. So it's probably time you enlisted the help of a friendly colleague in computing – preferably somebody who has previously assisted you in downloading the data from the archive onto your PC. (If he is familiar with Fortran format statements, so much the better.) Get him to run his eyes over your data and control files. It may well be that there's a problem with 'carriage return' or 'line feed' codes – mysterious instructions that get sent along with the file but can be invisible when you look at the file through the eyes of a text editor. If so, the method of data transfer will need to be modified appropriately (no problem for a computer expert, but a pain in the Aardvark for the ordinary user). 6. If all else fails, give WRc a ring, or email the offending files to us, and we'll be glad to discuss your problem with you. Let's end on a brighter note. Even if you do have teething problems, once you do sort them out – as you will – everything will be plain sailing thereafter. That's because all subsequent files you pull off the archive will be in exactly the same format as the first – and will follow all the same conventions for end-of-file flags, missing values and less-than values. So it'll be the work of a moment to prepare control files for subsequent data sets once you've got the first one working OK. WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 345. Outputs from Aardvark 5.1 Printing The most immediately obvious way of getting outputs from Aardvark is to print them. You’ll see from the File menu (see Table 3.1) that there are two Print options: Quick Print and Print. As the first of these implies, Quick Print is the one to go for when you just need a cheap and cheerful working copy of the current screen, and you don’t want to be bothered with changing any of the default settings. In contrast, Print lets you control your output format - which means that you do need to spent a little time selecting your required options. You can choose to have between one and four Aardvark screens on a page (either portrait or landscape), and you can set the margins to the size you want. You can include a dummy blank plot: this is useful if, say, you’re printing a standard sequence of graphs to a three-per-page format, but the data for one of the graphs is missing. You also have a lot of control over the appearance of an Aardvark screen, both globally and for individual screens, before you get to the stage of printing it. At the global level, if you click on Defaults | Printer fonts you can select the required printer font and point size for each of four separate levels of text (main and sub-headings, axis labels and body text.) Don’t forget to click on File | Save Default Settings so that Aardvark remembers your preferred options for next time. Then within a particular screen, if you click on Options | Axis/Title Edit you can change the scales of graphs; switch grid lines or date headings on or off; and alter the captions on graphs and tables. This last feature is very handy if you’re preparing Aardvark material for a report, as it lets you edit in titles such as: Figure 3.7 – LAS concentrations in The Wash at Mundae Mourne. 5.2 The Copy Graph facility Still on the subject of report preparation, you’ll find that the File | Copy Graph option is a particularly useful feature. This writes the current screen to the Clipboard; so if you then switch to a previously opened document – in Word, typically – all you need to do is click on Edit | Paste to drop the graph or table into your report. So it’s rather like the Windows Print Screen command except that it copies just the active screen and not all the unwanted peripheral stuff. This feature was used, by the way, to incorporate the numerous illustrative plots that you’ll find in Appendix A. WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 355.3 Exporting data During the course of an Aardvark investigation, you’ll often find yourself creating new determinands. For example, you may need to: • save the deseasonalised residuals from a seasonal model; • transform several deteminands by taking logs; • use the Data | Restrict by Value option to delete a couple of outliers; or • use the Data | Restrict by Date option to pick out a three-year window of data for a classification or compliance assessment. Having gone to the bother of creating all these new determinands, it can be annoying to lose them as soon as you exit from Aardvark – especially if there’s a chance that you might wish to continue with the investigation in a subsequent session. This is the situation that File | Save Data/CSV file is designed for. This option lets you select any or all of the determinands that you’ve created during a session - along with any of the original ones you want to hang on to - and write them to a new Aardvark-type data file. Furthermore, the file is arranged in fixed-width columns separated by commas. This means that the format is suitable for reading into Aardvark either as a fixed-format or a CSV file. So to facilitate either option, Aardvark automatically creates both a CTL and a CSC control file for you. (Ingenious, eh?) There’s just one minor limitation with the Save Data option that you should be aware of: this concerns 'less-than' values. As soon as Aardvark reads the original data file, it modifies any less-than values that it encounters according to the instructions you’ve given it in the control file (for example, by multiplying them by 0.5×L). From that point onwards, therefore, the identity of any less-thans in the original file has been lost, and so the new Aardvark output file that you create will contain no indication as to which (if any) data values originally started out as less-than values. But this is rarely a problem. WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 36Appendix A Using Aardvark – Illustrative Examples A1 Enquiry I: Field Raynes Sewage Treatment Works OK, then; here goes First we'll load Aardvark, open a data file and take a general look at the data. Then we'll investigate one of the determinands in more detail, and go on to the question of whether it shows any seasonal pattern. The first thing you need to do is to start up Aardvark and open the data file. ? Get into Aardvark by double-clicking the Aardvark icon. ? Click File on the Menu Bar and then click Open Data File... in the File menu. You’ll be shown the 'Load dataset' dialog box: WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 37(If you want to quit this dialog box without selecting a file, click the Cancel button.) ? Now find the name of the data file by browsing among the list boxes. You may need to search through: • Drives, e.g. C: • Directories, e.g. Program Files\\\\A4W • File Names, e.g. FIELD.DAT until you find the file 'C:\\\\Program Files\\\\A4W\\\\FIELD.DAT'. ? When you've found the file 'FIELD.DAT' and clicked on it so that it's highlighted, click the OK button. Aardvark now reads the data file. Two new buttons appear on the Tool bar: The one on the left provides a spreadsheet-like table that allows you to examine the data, and the other provides an Overall Summary. Whatever specific questions you may have about quality trends and variations, it's always a good idea to spend a few minutes at the outset just having a general look at summaries of the data. If you've not had the opportunity in the past to do this, you'll hardly believe how productive such sniffing around can be. ? So, select Tables on the Menu bar and then select Overall Summary in the drop-down menu. (Or click the Overall Summary button on the Tool bar.) WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 38Screen 1: Overall Summary For each of the three determinands on file, this screen gives us • the number of samples, N (with the number of those that are less-than values, if any, shown in brackets), and the sample minimum, mean and maxaximum; • the St.Dev. value – the conventional standard deviation, which is a measure of the overall variability of the data; • the SDD value – which measures the short-term standard deviation; and • a miniature histogram showing roughly how effluent quality is spread between the minimum and maximum values – notice the characteristic right-hand skewness in each case. ? Have a quick look at the screen. Then press the F1 function key to see the Help text for this screen. This 'context-sensitive' Help is available for all Aardvark screens. Do use it at any time during these enquiries if you need a little more explanation. (Clicking the Help Tool on the tool bar takes you to the top of the Help hierarchy; there may be times when you want to do this, but generally F1 will be more useful.) ? Click on the Close window button in the top right of the Help screen to quit Help and get back to Aardvark. It would be nice to know how much of the overall variability is due to random sampling and analytical error, and how much to systematic trends. This is something that we can't tell just by looking at a histogram, but we do get a clue from the St.Dev. and SDD columns. Since the SDD value measures the short-term standard deviation, the smaller this is in relation to the overall St.Dev., the more likely we are to find something interesting upon closer investigation. In fact, Aardvark has performed its first statistical significance test. All three SDDs are displayed in red (rather than in black) with a pair of asterisks besides them. The red colouring WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 39indicates that the SDD is significantly smaller than the St.Dev. and the two stars indicates a significance level of 1%. (We have used the standard convention, viz 1 star for 5%, two stars for 1% and three stars for 0.1%.) So: all three determinands are worth investigating. On the basis that Suspended Solids looks the most promising candidate, let's look at that determinand in more detail. First, two asides... 1. You'll see that three new buttons have appeared on the Tool bar: . These are for: • copying the current window into the Clipboard; • setting up a print out of up to four selected windows; • a Quick print of the current window. 2. You may find that the font sizes on the screen are not as clear as you would like. If so, it's easy to change them by selecting Screen Fonts from the Defaults menu, and then selecting which type of text you want to change. For example, try changing the Sub Title fonts to Arial, Regular style and size 8. If you like the new font, you can save it for next time by selecting Save Default Settings from the File menu. If you don't save it, then next time you enter Aardvark the font will return to its previous setting. Back to our analysis. We've decided to take a closer look at Suspended Solids. ? Choose Select Determinand from the Single menu. ? Click S.S(105) and click OK – or just double click S.S(105). The determinand name will appear in the Status bar at the bottom of the screen. Now we've got 11 new buttons on the Tool bar, namely: These correspond to the 11 functions available on the Single menu. WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 40? Click Single on the Menu bar to see what they are. The first option is the Determinand Summary, so let's select that one. ? Click Determinand Summary in the Single menu. WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 41Screen 2: Determinand Summary: SS(105) For our chosen determinand, this screen now gives us: • more comprehensive summary statistics, • a more detailed histogram, and • a time series. When we've browsed around this screen for a while, various questions start coming to mind. For example: • The zig-zags in the time series look rather regular. Could there be some seasonality present? • Quality seems to have improved at around 1979. Can we quantify this more precisely? • The 95%ile is 56.8 mg/l but our consent is only 45 mg/l. Does this mean that the effluent will fail every year? • The histogram looks a bit skew. Is it log-Normal, I wonder? ? Already, you'll notice, we're wanting to follow up specific inquiries, whether or not we'd had them in mind at the outset. So let's start by following up the question of seasonality... WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 42Choose Year-on-year Plot from the Single menu. Screen 3: Year-on-Year: SS(105) Interesting, eh? Aardvark has plotted the first year (1973) against its date within the year (January to December), plotted the second year (1974) over the top on the same graph and so on for each year until the final year (1985). The idea is to show whether there is any consistent pattern across all the years. There's a very clear tendency for SS values to be both higher and more variable in the first half of the year (putting to one side the unusually bad samples in January 1975 and June 1978). In contrast, quality is much more tightly bunched in the second half of the year, and is at its best in around August and September. (Again, if the fonts for the titles or the axis labelling aren’t satisfactory on your screen, you could change them as described in Aside 2 above.) ? Select Options on the Menu bar. Click Lines and see the lines joining the points disappear from the graph. (You could also click Points and make the data disappear altogether, but this would be, er, pointless.) If you click Lines again, the lines will return. ? Select Fit Seasonal Model from the Options menu. WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 43Screen 3a: Year-on-year: SS(105) Aardvark fits a shifted sine curve to the Year-on-year plot. This appears to fit the data quite well in this example. But is it statistically significant? ? Select View Fitted Model from the Options menu. Screen 3b: Seasonal model details [See the graph on the screen] You'll now see a screen giving you information about the fitted seasonal model. It's quite complicated, so, we'll skip this on first reading. (You can get more explanation by clicking F1 for Help and searching for ‘Year on year plot’.) But the fact that Aardvark has drawn a fitted model at all indicates that it is statistically significant. The significance level that Aardvark is currently using is shown in the bottom right corner of the screen. This is 5% in the default settings Aardvark, but you can change it if you wish. ? Click OK to exit this window. Now let's look at another function in Aardvark which can be a very useful pointer to the presence of a repeating seasonal pattern... ? Select Autocorrelations Plot from the Single menu. WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 44Screen 4: Autocorrelations Plot To see what this is telling us, take the example of the point for lag 11. This shows that the correlation between all pairs of SS values 11 samples apart is a little under 0.5. Moreover, this point on the plot is higher than those on either side – i.e. for lags of less than 11 or more than 11. In other words, there is a natural tendency for SS to be particularly similar to its value 11 samples ago. But 11 is the average number of samples taken in a year. (Take that on trust for the moment; you'll be seeing the annual sampling frequencies for yourself in Enquiry II.) So SS values are most similar at similar times of year – which is just another way of saying that SS has a seasonal cycle What's more, the seasonality is so persistent that it causes sample values exactly two, three and even four years apart to generate similar peaks in autocorrelation – as we see from the succession of peaks at lags of 23, 35, 46 and 56. So: a consistently repeating wave pattern is strong evidence for a seasonal component in the determinand we're looking at. But why, you may be wondering, does the wave drift steadily down the screen? Let's just put that question to one side for the moment, as we'll be answering it later . Before placing much credence in the Autocorrelations plot, however, you should take a look at the degree of regularity with which your data set was collected over time. Why? Well, ideally, the data should have been collected with a fixed interval between successive observations. Small variations of a day or two, or the occasional large variation, may not matter too much. But if there is a large spread in the inter-sample days, you shouldn't attach much importance to the shape of the graph as the autocorrelations will be very difficult to interpret in practice. ? So: select InterSample Times Plot from the Single menu. Then, select Sideways Histogram from the Option menu. WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 45Screen 5: Intersample Times Plot The ideal picture would be a field of green dandelions all of identical height, and a sideways histogram with a single tower. We certainly don't have that here. There are two exceptionally tall stems, telling us that on two occasions there was a gap of about four months (i.e. 120 days) between successive samples. We can also see that there were a further half a dozen occasions when the gap was two months instead of the customary one. Even so, the picture isn't too bad. The sideways histogram shows that the majority of intersample days was between 20 and 40. So, the pattern we've seen in the autocorrelations is probably not just a quirk of the timing of the samples. OK: this is a good point to stop for a comfort break. But if you feel like carrying on with Enquiry II, please do so. ? To get out of Aardvark, select Exit from the File menu. WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 46A2 Enquiry II: Field Raynes Sewage Treatment Works By the end of Enquiry I, we'd built up a good understanding of the seasonal behaviour of Suspended Solids (SS) at Field Raynes STW. In the second part of the enquiry, we'll continue our examination of SS quality by focussing on variations in the longer term. If you've come straight from Enquiry I without a break, ignore the next block of instructions. ? Get into Aardvark by double-clicking on the Aardvark icon. ? Select Open Data File... from the File menu. ? Browse among the list boxes until you find the file 'FIELD.DAT'. ? When you've found and highlighted it, press the OK button. ? When Aardvark has read in the data file, choose Select determinand from the Single menu. Click S.S(105) and click OK. ? Now we're ready to continue.Select Time Series Plot from the Single menu. Screen 6: Time Series Plot Aardvark plots the data values against date and overlays the annual means onto the time series data. There's a clear improvement in quality from 1979 onwards. As yet, though, we have no way of telling which of those means are genuinely different from which others. (But we will have – be patient) ? From the Options menu, ask to Highlight extremes.... ? Enter the 95%ile consent, namely 45 mg/l, by typing 45 in the Upper Limit box. Leave the Lower Limit blank. Click OK. WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 47Screen 6a: Time Series Plot plus highlighted excedences of limit Aardvark shows the consent as a horizontal red line, with those data points exceeding the limit highlighted in red. The clusters of excedences up to 1978 clearly illustrate the problems that Field Raynes STW had at one time with SS compliance. But we can see that since 1978 there has been only the occasional excedence every couple of years or so. ? Now select Sideways Histogram from the Options menu. Screen 6b: Time Series Plot with Sideways Histogram Aardvark has drawn the sideways histogram to the left of the time series. Notice the exact equivalence between the red-circled excedence points in the time series and the red tail of the histogram. Notice also how, if you were looking at just the histogram on its own, you would have no idea that the percentage compliance figures were so much better in the second half of the data. So that's a nice illustration of how you can lose information through indiscriminate pooling. WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 48? Select Yearly Statistics from the Single menu. Screen 7: Yearly Summary To round off this sequence of displays, we've asked for a table of annual summary statistics. We can see that the annual sampling frequency was typically 11 or 12, slipping back occasionally to 9 or 10. So the mean frequency is about 11 ? as we remarked in Screen 4. This is another good place to have a rest from Aardvark. But, if you feel like carrying straight on with the next part of the Field Raynes STW enquiry, jump to Enquiry III. ? Just select Exit from the File menu. WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 49A3 Enquiry III: Field Raynes Sewage Treatment Works As we said in Chapter 1, looking at annual averages isn't the entire answer. How do we know which years are really different from which others? And in any case, trends don't always conveniently start up exactly at year-end. Furthermore, will a subtle but persistent trend show up in crude annual averages anyway? These snags are all by-passed by Aardvark's cusum facility (short for cumulative sum); and this is the function that we're now going to demonstrate. We'll still be looking at Suspended Solids (SS) quality; if you've come straight from Enquiry II without a break, jump over the next block of instructions. ? Get into Aardvark by double-clicking on the Aardvark icon. ? Select Open Data File... from the File menu. ? Browse among the list boxes until you find the file 'FIELD.DAT'. ? When you've found and highlighted it, press the OK button. ? When Aardvark has read in the data file, choose Select determinand from the Single menu. Click S.S(105) and click OK. Now we're ready to continue. If the idea of cusums is new to you, we do recommend that you dip into the Help screens in what follows. ? On the Single menu, point the mouse at Cusum Plot. Click but don't release the mouse button. Whilst holding the button down, press the F1 function key. This gives some introductory help on cusums. When you’ve seen enough, quit Help in the usual way. ? Now select Cusum Plot from the Single menu. WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 50Screen 8: Cusum Plot This opening screen shows the cusum plot of SS. You may be wondering about the pencil of rays that Aardvark's showing on the left of the screen. Again, let's turn to the Help screen for illumination (we hope). ? Press F1 for Help. When you have seen enough, quit Help in the usual way. ? Move the pencil of rays along the graph, by clicking the mouse at various points along the graph. WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 51Screen 8a: Cusum Plot plus mobile pencil of rays Aardvark lets us move the pencil of rays along the cusum, keeping the central ray horizontal. In the first section of the cusum, you can see at a glance that the slope is a bit steeper than the uppermost ray of the moving pencil. This tells us that the local mean is running at a level higher than 0.4 SDD above the target that is, at above 28 + 4 = 32 mg/l... ... whereas in the later stretch, the current mean is now running at a level lower than 0.4 st.dev below the target level, namely at below 28 - 4 = 24 mg/l. This option isn't just a cheap gimmick We've found that it can provide a useful way of getting across the basic idea of cusums to others. Try it out on your colleagues... One point to note: we've found from experience that to be able to differentiate between random noise and meaningful changes in slope it's a good idea to adjust the vertical scale so that there is an angle of roughly 45° between the horizontal and the outermost rays of the pencil of rays. ? As an exercise, change the vertical scale by clicking the Compress and Expand buttons. When you've got the outermost rays to be at about 45° above and below the horizontal, click Continue to move on to the next screen. WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 52Screen 9: Cusum Plot plus cross-hair cursor We've lost the pencil of rays, but gained a cross-hair cursor, lurking at the left-most point of the plot ready for business. Now we start moving the cross-hair cursor around picking out hinge points. The mountain peak just before half-way looks an obvious candidate. So let's start by heading for that... ? Point the mouse at the peak and click. If you've not quite hit the peak, move the mouse and click again. For fine tuning, you may find it easier clicking the arrows at the ends of the scroll bar below the graph. Head for observation number 64 (the current position is always indicated in the top left of the screen). ? When you're happy with the position of the cross-hair cursor, click the Select button to mark the point as a hinge. WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 53Screen 9a: Cusum Plot with a selected hinge Aardvark has drawn in the two slopes implied by our choice of hinge, and has carried out a statistical significance test. In the top left corner of the screen we see the observation number (62, 63 or 64 depending on where you've made your choice) and the significance level (0.1%). This tells us that the change in slopes at the first hinge is highly significant. Now move the cursor along a bit. Note that the hinge point has turned red. Aardvark keeps track of the significance of each hinge point as new ones are added, and it colours hinges red only if they’re statistically significant. Let's go for the point near the end where the cusum suddenly levels out, and mark a second hinge. ? Move the cursor to observation number 136 and click. Now click Select. Let's see whether we were justified in picking this second change point... WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 54Screen 9b: Cusum Plot with two selected hinges Aardvark has drawn in the slopes implied by our new choice of hinge. The statistical significance test concludes that there is insufficient evidence to support our second hunch. This is indicated by NS (for Not Significant) in the top left corner. To remove this hinge, we could (but we won't) click the Unselect button. (This has replaced the Select button because the cross-hair cursor is positioned at a hinge point). Instead, we'll reset the graph to deselect both hinge points... ? Click Reset ... and now ask Aardvark to perform its automated hinge selection routine... ? Click Auto Screen 9c: Cusum Plot with automatically selected hinge [See the graph on the screen] Aardvark decides that only a single hinge is appropriate, namely the one at the mountain peak. You could, if you wish, select other likely looking hinge points, but let's press on... ? Click Continue. Aardvark now produces three windows in quick succession. We'll look at the final one first and then go back to the other two. WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 55Screen 10: Cusum Plot Aardvark shows us a more polished version of the cusum. Note the significance level given above the hinge point. ? Move to the next window by clicking the Control-menu button in the top left corner of the Cusum window and then selecting Next. This will reveal the window beneath... Screen 11: Cusum Manhattan Plot This presents a dramatic translation of the cusum analysis In this display, our trend model is superposed in purple against the backcloth of the original time series in blue. (The green line – which we’ve sneakily added using the Options menu – is the long-term mean.) From this we see that Aardvark has sharpened up and confirmed our initial suspicion that there had been a pronounced improvement in mean quality near the end of 1978 – from about 36 to 20 mg/l. WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 56? Move to the next window by clicking the Control-menu button in the top left corner of the Manhattan window and selecting Next. This will reveal a further window underneath... Screen 12: Cusum Statistics Aardvark now provides summary statistics for each of the separate sub-groups of data picked out by our cusum investigation, so that we can quantify the various changes. We can see precisely the date at which the change took place, and the average levels before and after the change. (Thinks: I wonder what caused the change. Was the works upgraded at around that time? Who can I ask?) So to recap, we've now shown the existence of a longer term trend characterised by a step change near the end of 1978. Note that the SDD in each period is substantially below the Standard Deviation. This implies that there are other systematic patterns in the data, and probably reflects the seasonal pattern we've already seen in Suspended Solids. Are there any other patterns in Suspended Solids lurking behind the seasonality and the step change? The way to find out is to remove the effect of these two features. First, we'll deseasonalise the data. ? Choose Year-on-year plot from the Single menu. We've already had a good look at this graph so we'll press on smartly. ? From the Options menu, select Fit Seasonal model as we did before. ? When the seasonal curve has been plotted, pick Save Deseasonalised Data... from the Options. This will set up a new determinand called 'SS(105) Desea.'. Click OK. Now, let's select the deseasonalised data and look at the Autocorrelations Plot. WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 57? Choose Select Determinand from the Single menu. Select SS(105) Desea. and click OK. ? Select Autocorrelations Plot from the Single menu Screen 13: Autocorrelations Plot Compared with the previous Autocorrelations plot that we looked at, we no longer have any clear evidence of periodicity in the data. But the plot still moves down the screen from left to right... This tells us that we've still got some systematic patterns in the data, but of a longer duration than the seasonality we saw earlier. So let's go for a cusum. We shall now perform a cusum analysis making use of the Auto button. ? Select Cusum from the Single menu. ? Click Continue on the 'Pencil of Rays' cusum. ? Click Auto on the 'Cross-hair' cusum. This looks rather familiar, so we'll move swiftly on. ? Click Continue. We're not particularly interested in the final Cusum plot so... ? Click the Close window button on the Cusum plot title bar to close it down. This brings up the Manhattan Plot. ? Now, select Save Residuals from the Options menu. WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 58This will create another new determinand called 'Cusum Residuals - S.S(105) Desea.'. (In a real investigation it would be a good idea to change this name to something simpler, but we’ll stick with it for now.) ? Click OK The new determinand has had both the seasonal pattern and the step change removed from the original SS data. Let's take a look at it. ? Choose Select Determinand from the Single menu. Then select Cusum Residuals - S.S(105) Desea. and click OK. (If you display the determinand summary, you'll see that the determinand mean is zero. This is because the cusum residuals measure the deviations about a 'mean' model and so automatically have a mean of zero.) Let's view the Autocorrelations Plot as we've done before. ? Select Autocorrelations Plot from the Single menu Screen 14: Autocorrelations Plot We now have no apparent evidence of periodicity in the data and the plot moves horizontally across the screen. In fact, virtually all points lie between the two parallel red lines; this tells us that what we're left with now is pure white noise. There are no more systematic patterns in the data. So we can be confident that we've really got to the bottom of the main components of variation in Suspended Solids quality at Field Raynes STW. At last we've arrived at the end of this particular line of enquiry. Close Aardvark and take a rest ? Select Exit from the File menu to leave Aardvark. WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 59A4 Enquiry IV: Field Raynes Sewage Treatment Works In our enquiries so far, we've been looking mainly at the types of time trend – seasonal and longer-term – shown by Suspended Solids (SS). Now, in this final part of our examination of the Field Raynes data, we'll turn our attention to the nature of the random component of variation in SS over a seven-year period during which – as we've seen – quality remained essentially stable. Again, do make use of the Help screens as you pass through the various screens, as and when you feel you could do with some background explanation... Start by opening Aardvark and picking the file FIELD.DAT. ? Get into Aardvark by double-clicking on the Aardvark icon. ? Select Open Data File... from the File menu. ? Browse among the list boxes until you find the file 'FIELD.DAT'. ? When you've found and highlighted it, press the OK button. We'll use the 'Restrict by date' function to restrict our attention to Suspended Solids data from 1-Jan-1979 onwards. (Note that Aardvark performs restrictions by setting up new determinands, leaving the original determinands unchanged and accessible for the full date range.) ? Start by selecting the Data menu. Note that this is the menu for transforming, as well as restricting, the data. ? Select Restrict the data by Date... from the Data menu. Click in the Determinand box and click S.S(105). The New Determinand box displays the default name of the new restricted determinand. If you don't like this name, click the New Determinand box and type in a different name. Just a quick aside about date formats. Aardvark assumes that the date format is the same as that set up in the International Setting of the Windows Control Panel. This will be one of the following: DMY (for Day, Month, Year), MDY, or YMD. We want to specify 1st January 1979 as the start date; so assuming that your machine is set to DMY, this will be represented by 010179. (If your machine is set to YMD, 790101 is the correct entry; if it’s set to MDY you should enter 010179.) If Aardvark doesn't recognise the date that you enter, it will give you an error message and ask you to try again. OK: back to the enquiry... ? Click the New Start Date box, and type in 010179 or 790101 as appropriate. There's no need to change the value in the New End Date box. WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 60? Click OK. ? Now, select Restricted (S.S(105)) as the current Determinand via the Single menu. We have previously seen that Suspended Solids contains seasonal variation. We need to remove this systematic component before we analyse the random component in this determinand. We've already seen how to do this so we'll run through it fairly quickly. ? Choose Year-on-year plot from the Single Menu. Screen 15: Year-on-year plot [See the graph on the screen] The seasonality is similar to what we've seen before, although with fewer years of data, so let's fit the seasonal model and save the residuals. ? Select Fit Seasonal Model from the Options menu. ? Select Save Deseasonalised Data from the Option menu. Click OK ? Now, select Restricted (S.S(105)) Desea. as the current Determinant via the Single menu. ? Next, pick Histogram from the Single menu. Screen 16: Histogram The histogram that Aardvark has drawn in this opening screen looks about right, so we'll resist the temptation to tinker with the resolution. Let's try fitting a Normal curve to this data. ? Select Normal Model from the Options menu. WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 61Screen 16a: Histogram plus Normal curve Now Aardvark superposes the best-fit Normal curve on the histogram, and also tells us via the message at the top of the screen that the fit is poor. We don't think anyone would argue with this conclusion. (Browse the Help screens for an introduction to distribution fitting and testing the goodness of fit.) Now let's go for a Log-Normal Model in the Options menu. Unless you have strong views to the contrary, we'll take the Maximum Likelihood approach... ? Select Log-Normal Model (ML) from the Options menu. Screen 16b: Histogram plus Normal & log-Normal curves Now Aardvark overlays the best-fit log-Normal curve. The 'Not poor' message tells us that this model does provide an adequate fit to the data. (If you're wondering why Aardvark doesn't simply say 'Good fit', see the Help screens for an explanation...) ? Select Shifted Log-Normal Model from the Options menu. WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 62Screen 16c: Histogram plus Normal, logN & shifted logN curves The final curve fitted and tested by Aardvark is the shifted log-Normal. This, too, provides an adequate fit to the data (though we have no reason to prefer it to the standard two-parameter log-Normal model in this example). Let's show the parametric 90%ile estimates. ? Select Set Percentile Value... in the Options menu. Type 90 in the box and click OK. Screen 16d: Histogram plus curves plus percentiles Note that in this case the three models give almost identical 90%iles. Now, try the 99%ile... WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 63? Select Set Percentile Value... in the Options menu. Type 99 in the box and click OK. Screen 16e: Histogram plus curves plus percentiles [See graph on screen] The 99%iles are quite different. This illustrates how different assumptions about the statistical distribution can change the value of the resulting percentile – perhaps quite dramatically. ? Finally, click Show Fitted Model in the Options menu. Screen 17: Histogram with no Fitted Model Information Aardvark removes the box containing the estimates of the parameters to the fitted models. This simplifies the graph for reports. ? Click OK. This ends our investigation of Suspended Solids in FIELD.DAT. Now you can quit Aardvark in the usual way. ? Select Exit from the File menu. Footnote OK, OK, so you've spotted the deliberate mistake. We've established that SS is much closer to being log-Normally than Normally distributed; so why didn't we run the cusum earlier on log(SS) rather than SS? Well, you have a point. But first, we didn't want to complicate matters by introducing Restrictions and Transformations early on. And secondly, the cusum method is actually so robust that it can cope very well with quite marked departures from Normality. But in this example, as it happens, a cusum analysis on log(SS) finds an extra step compared with the non-logged values; try it and see WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 64A5 Enquiry V: Halder Brook at Carousel Lane Now for a change of scene: let's have a look at a set of routine quality data for a river monitoring station. As in our earlier enquiry, we'll start by having a general browse through the data summary, and then pick out an interesting-looking determinand for closer study... We need to start by opening Aardvark and picking the file HALDER.DAT. ? Get into Aardvark by double-clicking on the Aardvark icon. ? Select Open Data File... from the File menu. ? Browse among the list boxes until you find the file 'HALDER.DAT'. ? When you've found and highlighted it, press the OK button. We'll begin with an overall summary, as usual. ? Select Overall Summary from the Tables menu. Screen 1: Overall Summary For each of our three determinands, this screen gives us a set of summary statistics, plus a miniature histogram showing roughly how the determinand values are spread between the minimum and maximum values. For BOD and ammonia, we see the marked skewness characteristic of many river quality determinands. This suggests that, if we look at those determinands in more detail, it would be sensible to transform them by taking logs. But first, let's look more closely at Dissolved Oxygen (DO)... ? Click Select Determinand in the Single menu and click DO. Click OK. WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 65? Now select Determinand Summary from the Single menu. Screen 2: Determinand Summary The most striking feature to spring out at us from this screen is the saw-toothing regularity of the time series plot. So let's go straight to a Year-on-year Plot to see just how strong the seasonality is... ? Select Year-on-Year Plot from the Single menu. Screen 3: Year-on-Year Plot Pretty dramatic, eh? We can see from this that the seasonal variation in mean DO is typically about ±3 mg/l. Let's save the deseasonalised data, as we'll need it later. ? Select Fit Seasonal Model on the Options menu Then, select Save Deseasonalised Data... from the Options menu. Click OK WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 66If you were impressed by the seasonal model, just wait until you see the Autocorrelation Plot. But first, as always, it's a good idea to look at the regularity of sampling. So: ? Select InterSample Times Plot from the Single menu. Screen 4: Intersample Times Plot Ah, now this plot sounds a warning. The first 40 or so observations were taken at around monthly intervals. Thereafter, the sampling frequency was evidently uprated from monthly to fortnightly. So the Autocorrelations of the full series will not mean a great deal. Therefore, we'll use the Restrict by Date function to cut out all data before 1977... ? Select Restrict the data by Date... via the Data menu. ? Select determinand DO. ? Enter the code for 1st January 1977 (i.e. 010177 or 770101 as appropriate) in the New Start Date box. Don't touch the End Date. Click OK ? From the Single menu, choose Select Determinand and then double click Restricted (DO). ? Select InterSample Times Plot from the Single menu, and then pick Sideways Histogram from the Options menu. WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 67Screen 5: Inter-Sample Times Plot This plot is now more satisfactory. We do still see quite a lot of scatter around the usual inter-sample time of 14 days; but the histogram shows that these points are now in sufficient of a minority for us to look at the evidence of the autocorrelations with a fair degree of assurance. ? Select Autocorrelations Plot from the Single menu. Screen 6: Autocorrelations Plot How about that, then The repeating wave pattern is so strong that it looks almost too good to be true. What's more, it shows no tendency to drift down the screen; so that tells us that there's little or no long-term trend in DO. But let's see for ourselves by performing a cusum analysis... WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 68There’s no benefit in sticking with the restricted data set for cusum analysis, so let's go back to the full data set. ? Click Select Determinand on the Single menu. Choose DO instead of Restricted DO. Click OK. We're deliberately going to ignore the strong seasonality for the present simply to show what can happen to a cusum where there is very marked seasonal variation. ? Select Cusum Plot from the Single menu. Screen 7: Cusum with pencil of rays This opening screen shows the cusum plot of DO. The pencil of rays conforms nicely to our preferred 45 degrees above & below the horizontal, so we don't need to compress the plot. ? Click Continue. WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 69Screen 8: Cusum with cross-hair cursor In many cusums, there's very little difficulty in deciding where to position the hinges, to within a few observations either way. But this particular cusum presents us with a problem: because it's so dominated by seasonality, there's considerable room for debate as to where the hinges should go. The cusum seems to have major changes in direction somewhere around observation 70 and then again at around 240. But just look at all the little zig-zags Let's look at Aardvark's choice... ? Click Auto. Screen 9: Cusum Plot plus selected hinges Aardvark has clearly been confused by the seasonality in the data. It has drawn in 13 different slopes. The seasonality within years is sometimes captured by a pair of slopes and is sometimes missed. The last slope covers three seasonal peaks and may represent a longer term trend. All in all, it’s a dog’s breakfast. WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 70So, when we’re looking for longer term trends, we see that strong seasonality can created considerable difficulties for the cusum analysis. But now let's see how the cusum appears if we use deseasonalised data. (Now you can see why we saved the deseasonalised DO after Screen 3.) ? Click Cancel. ? Choose DO Desea. as the selected determinand and then ask for another cusum. (You should be getting the hang of how to do this by now...) Screen 10: Cusum with pencil of rays The opening screen shows we've successfully removed the oscillations caused by that impressive seasonality. The pencil of rays still conforms nicely to our preferred 45 degrees above & below the horizontal, so we won't compress the plot. ? Click Continue. WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 71Screen 11: Cusum Plot plus cross-hair cursor Now there's no difficulty in seeing where two hinges should go. Does Aardvark get it right? Let's see... ?Click Auto. Screen 11a: Cusum Plot plus selected hinges Aardvark has found the two obvious hinges. Both hinge points are drawn in red, which means that both changes in slopes are statistically significant (at a level of 5% or better). ? Click Continue. WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 72Screen 12: Finished Cusum Plot Now we can see that the two hinges are very highly significant, as the significance levels are both <0.1%. In other words, there's less than a 1 in 1000 chance that these apparent shifts in mean DO could be due to random sampling error. ? Move to the next window by clicking the Control-menu button and selecting Next. Or use the shortcut, Ctrl+F6. Or just use the top right Close window button Screen 13: Manhattan Plot Aardvark shows us the trend model superposed against the backcloth of the original time series. The change in mean DO at around the end of 1977 and at the start of 1985 is about 1 mg/l – from 10.0 to 9.0 mg/l and back again. (You'll see the exact figures in the next screen.) So despite there being huge seasonal swings in the original data, the cusum of the deseasonalised data can pick up quite small long-term trends. WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 73? Close the Manhattan Plot, by clicking the Control-menu button and selecting Next. Screen 14: Cusum Summary Statistics For the record, Aardvark gives us summary statistics for the three sub-groups picked out by our cusum analysis. You can now go straight on to the next Enquiry or you can take a breather. If the latter... ? Select Exit from the File menu. A6 Enquiry VI: Halder Brook at Carousel Lane In the second part of our Halder Brook enquiry, we're switching to BOD. As always, we'll start by having a general look at the determinand summary to see what interesting features show up. If you've come straight from Enquiry V, skip the next block of instructions. ? Get into Aardvark by double-clicking on the Aardvark icon. ? Select Open Data File... from the File menu. ? Browse among the list boxes until you find the file 'HALDER.DAT'. ? When you've found and highlighted it, press the OK button. Now continue with this enquiry. ? Choose BOD as the Selected Determinand. ? Select Determinand Summary from the Single menu. WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 74Screen 15: Determinand Summary The maximum observation is 13.6, which helps to explain why the histogram is skewed to the right; maybe we should do a log transformation. But, two chimneys stick out like a sore thumb (or should it be two sore fingers?). Let's see what has caused them... ? Select the Time Series Plot from the Single menu. ? Click Lines in the Options menu to switch off the blue lines. ? Click Yearly Means in the Options menu to switch off the purple lines. Screen 16: Time series plot The time series shows us the reason for the chimneys that we've just seen. There are three preferential values after 1980, namely 1, 2 and 3 mg/l (Thinks: that may have been ‘cost effective’ for the lab; but is the loss of precision acceptable to the water quality officer?...) WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 75We’re going to perform a cusum analysis next, after taking logs to make the data less skewed... ? From the Data menu, select Transform Single Determinand. ? Select BOD as Determinand. Taking logs to base 10 is the default transformation, so just click OK. ? Now, select Log-10(BOD) as the Selected Determinand via the Single menu. Click OK. It’s a good idea to glance at the determinand summary of log(BOD) at this point, but we'll press on with the cusum analysis. So: ? Click the Cusum button on the Tool bar. Screen 17: Cusum Plot with pencil of rays OK, then. This opening screen shows the cusum plot of log(BOD). Unlike in the DO cusum, the pencil of rays here is a bit too wide for comfort; so we'll start by compressing the plot... ? Click Compress. Yes, that's better. ? Click Continue. WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 76Screen 18: Cusum Plot with cross-hair cursor This cusum is much easier to interpret than the one for DO – and the absence of awkward ripples gives us the useful incidental message that there's little or no seasonality. There are two clear turning points for us to pick out; let's see if Aardvark thinks so too... ? Click Auto. Screen 19: Cusum Plot plus selected hinges Yes, Aardvark has selected the two hinges that we expected it to. ? Click Continue. WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 77Screen 20: Final Cusum Plot The statistical significance tests for the differences between the slopes indicate that the changes in slope are significant at the 2% and 0.1% levels respectively. In other words, the evidence is much stronger for the second shift in mean levels than for the first. ? Display the Manhattan plot by clicking the Control-menu button and selecting Next. Screen 21: Manhattan Plot The three mean BOD levels are not very different compared with the variation in the data. This illustrates the power of the cusum to discover statistically significant changes in the mean level that would be difficult to pick out by eye. (Of course, they may not be practically significant...) ? Bring forward the Cusum Statistics window. WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 78Screen 22: Cusum Summary Statistics The means are 0.29, 0.19 and 0.30 again – corresponding, after antilogging, to geometric mean values of 1.9, 1.5 and 2.0 mg/l. That's the end of this enquiry. You may like to have a go at analysing the ammonia data on your own. If you do, you'll see it's worthwhile doing a log transformation. ? Quit Aardvark by selecting Exit from the File menu. A7 Enquiry VII: Moss Keatose Water Treatment Works Our final enquiry relates to a five-year set of water quality data for a water treatment works. Here we'll be showing how to use Aardvark to investigate the joint patterns of behaviour shown by pairs of determinands, using the example of conductivity in the raw and final waters. In all, the data set contains nine pairs of raw and final water quality determinands; so you've plenty of scope for trying your hand at further investigations of your own... Start by opening Aardvark and picking the file MOSS.DAT: ? Get into Aardvark by double-clicking on the Aardvark icon. ? Select Open Data File... from the File menu. ? Browse among the list boxes until you find the file 'MOSS.DAT'. ? When you've found and highlighted it, press the OK button. We're going to have a look at a pair of determinands, so we need the Pair menu, rather than the Single menu that we've been using up to now. ? Click Select determinands in the Pair menu. WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 79? Click Fnl Condtvty in the First Determinand (y) box and click Raw Condtvty in the Second Determinand (x) box. Click OK. We've specified the determinands in that order because we want to plot Final conductivity on the Y axis against Raw conductivity on the X axis, not the other way round. ? Select Scatter Plot from the Pair menu. Screen 1: Scatter Plot This shows us that final conductivity is very strongly correlated with raw conductivity. (Thinks: why there are a few stragglers? Worth following up?) ? As we're interested in whether there's any change in conductivity during the treatment process, let's see whether (on average) Y = X. ? Click Y = X line from the Options menu. WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 80Screen 2: Scatter Plot plus "Y = X" line Now we can see that virtually all of the data is just above the Y = X line; this tells us that conductivity increases slightly but persistently from raw to final water. Let's now investigate the through-the-works increase in conductivity in more detail by forming a new determinand, "Final - Raw" conductivity, and then doing a time series analysis on that... ? Click Data on the Menu bar and select Combine Two Determinands. ? Select Subtract (X-Y) as the transformation. ? Choose Fnl Condtvty as determinand X and Raw Condtvty as determinand Y. We've specified the determinands in that order because we want to look at 'Final - Raw'. ? In the New Details box, edit the determinand name to 'Final - Raw Conductivity'. Click OK. Now we've created the new determinand, we want to look at its Time Series. ? In the Single menu, click Select Determinand. Our new determinand appears at the end of the list. Select it and click OK. ? Select Time Series Plot from the Single menu. WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 81Screen 3: Time Series Plot Hmm, this looks interesting. The annual means suggest some sort of trend over the five years – certainly a difference between 1982 and 1983. Let's try a cusum... ? Select Cusum Plot from the Single menu. Screen 4: Cusum Plot with pencil of rays This opening screen shows the cusum plot of 'Final - Raw Conductivity' The plot splits into three obvious segments; so let's get on and pinpoint them... ? Click Continue ? On the next screen, click Auto. Yes, Aardvark has found those two hinge points, so... ? Click Continue. WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 82Screen 5: Final Cusum Plot Now Aardvark has drawn in the three slopes implied by our choice of hinge, and carried out statistical significance tests for the differences between slopes. The conclusions are that the first change in slopes is significant at the 1 in 100 level, and the second at the 1 in 1000 level. ? Click somewhere on the partly hidden Manhattan Plot window. Screen 6: Manhattan Plot Here we see the trend model superposed against the backcloth of the original time series. This shows us that final water conductivity was running at about 9 uS/cm above raw water conductivity for the first few months of 1981; then shot up to 26 uS/cm above for the next 18 months; and then dropped back to 12 uS/cm above the raw water value for the remaining three years. (We can confirm the exact figures in a moment...) ? Click somewhere on the Cusum Summary Statistics window. WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 83Screen 7: Cusum Summary Statistics As usual, Aardvark now gives us those figures more precisely in the cusum summary statistics table. The close agreement between the Std.Dev. and the SDD values tells us that there is no substantial remaining component of systematic variation – seasonality, for example. (You could confirm this by saving the cusum residuals and then plotting their autocorrelations.) So our simple step-change model has more or less said it all. Well, that's reached the end of the final enquiry. But that doesn't mean to say that you have to pack up straight away. You've got the current selection from the MOSS.DAT data set still in Aardvark, so why not conduct your own investigation of some of the other determinands? For a start, you might try having a look at the seasonal nitrate pattern in the raw water... Then, exit from Aardvark in the customary manner. WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 84Appendix B Statistical Tables Table B.1 Values of the correlation coefficient just significantly greater than zero for various numbers of samples No. of samples Significance level 5% 1% 10 0.63 0.77 20 0.44 0.56 30 0.36 0.46 40 0.31 0.40 50 0.28 0.36 75 0.22 0.29 100 0.20 0.26 Table B.2 Minimum values needed for the correlation coefficient of the Normal probability plot before the Normality hypothesis can be entertained No. of samples Significance level 5% 1% 10 0.917 0.876 20 0.950 0.925 30 0.964 0.947 40 0.972 0.958 50 0.977 0.965 75 0.983 0.975 100 0.987 0.981 WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 85Appendix C Installing Aardvark C1 Installing the Standalone version (Quick Start Guide) To install Aardvark, just work through these simple steps: 1. Aardvark should be installed by the Administrator user of the computer. Close any other open programs before proceeding with the installation. 2. Insert the Aardvark CD into your CD drive. Then: • Select Start > Settings > Control Panel > Add or Remove Programs • Click on the Add New Programs button • Select the CD or Floppy button • When prompted for the CD or floppy disk – click on the Next button • D:\\\\Aardvark32.exe should appear in the Open box (where D is the CD drive) • Click on the Finish button to accept this • The CD will now start and the Aardvark Installation screen will appear. WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 863. Click Next to continue 4. You will be offered options to choose which features of Aardvark to install. Select the components you wish to install and click Next to continue. WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 87 5. You will be offered the choice to select the install location for Aardvark. Select your preferred location and click Install to continue. WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 886. Installation is now complete. Click on Close. 7. The installation will have created a program menu item (Start > Programs > WRc programs > Aardvark32) that can be used to start Aardvark. 8. Once installed you will need to register the product the first time you use it. Locate the software on the program menu and start up Aardvark. You will then be offered three options for Product Registration: • Configure as a standalone program • Configure as a trial • Configure as a network client 9. Select ‘Configure as a standalone program’ by checking the box and you will be prompted for the Product Key. Type, or paste, the Product Key emailed to you into the box provided and click on ‘OK’. 10. You will be prompted that the Product Registration programme wishes to connect to the Internet to validate the installation. If you are connected to the Internet, click on ‘OK’. Otherwise, click on ‘Cancel’ and go to step 15. 11. If you selected ‘OK’ and the installation was validated proceed to step 12. If you selected ‘OK’ and the installation was not validated for some reason, jump to step 14. 12. You will be prompted to complete some contact details for your organisation. Please complete all of the fields (some are compulsory) and click on ‘OK’. 13. Provided the compulsory fields are completed, the Product Registration process is now complete and Aardvark will automatically start up. 14. If you selected ‘OK’ and the installation was not validated, an error message will be returned. Click ‘OK’ and the Product Registration process will be terminated. You will need to start up Aardvark again and recommence the Product Registration process (follow instructions from Step 9). 15. If you selected Cancel, you will be offered the option for Manual Activation. Selecting this option provides you with: • an internet address (http://www.copyminder.com/activate.php) to use on an Internet-connected PC; WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 89• a Product Key; and • an Installation Code. There is also a box in which to paste an Activation Code. When convenient, connect to this internet site and enter the Product Key and the Installation Code in the appropriate boxes and click on ‘Submit’. You will be provided with an Activation Code. This code can be copied and pasted electronically in the usual way. 16. Return to the Product Registration process and paste the Activation Code into the box provided and click on ‘OK’. Aardvark will start up automatically and is then available for use. C2 Installing the Network version (Quick Start Guide) To install the network version of Aardvark, you will need your Network Administrator to install the CopyMinder Network Server and to provide the users with a CopyMinder network path to enable installations on individual workstations. C2.1 Installing CopyMinder Network Server This process needs to be undertaken by your Network Administrator using these instructions and the files on the installation CD in the Network folder. The Network Administrator needs to choose some paths for use by CopyMinder. There are three paths to consider: • Program Path - this path should contain Aardvark's .cm file (usually also the path containing Aardvark). The client machine needs write-access to this path. It can be on any machine and can be different for different client machines. • CMNet Path - this path should contain the CMNet.exe program and one or more .cm files. The CMNet program needs write-access to this path. Ideally, this machine would have Internet access. • CopyMinder Network Path - both CMNet.exe and the client machines need write-access to this path. The Program Path and CMNet Path cannot be the same path. Otherwise, there are no restrictions. WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 90Your normal installation program should put the .cm file in the Program Path for each client machine. Your installer or the Network Administrator should copy the CopyMinder Network Server (CMNet.exe) to the chosen CMNet Path. They should also put a copy of the .cm file for Aardvark into this path (Do not use CMInstall.exe to do this.) The one copy of CMNet.exe can service any number of .cm files for different products (even from different Software Developers). The Network Administrator should first run CMNet /U to uninstall any existing CMNet. They should then run CMNet which will do the following: • For each .cm file (in the CMNet Path), it will check to see if it requires activating and prompt them for their Product Key if necessary. • It will then look for CopyMinderNetworkPath in the corresponding .cm.ini file. • If the CopyMinderNetworkPath variable does not exist, it will prompt the Network Administrator for a path and save it in a .cm.ini file. • It will place an icon in the System Tray for the CopyMinder Network Viewer (unless it is being installed as a Windows service). • The CopyMinder Network Server is now operational. CMNet.exe can be run as an application or a Windows service (you cannot run it as a service under Vista at present). To run it as a service, simply specify CMNet /S. To install the service, they will need a keyboard and screen to be connected but they are not required after that. Please note that Windows services do not, by default, support networks so you should not specify a UNC path or a mapped drive for the CopyMinder Network Path even if they refer to a local machine. You need to specify a path on the machine running the service instead (e.g. c:\\\\abc). Ideally, you should also ensure that the service is logged on to an account that permits Internet access. If they wish, they can create the .cm.ini file manually. The main entry that they may want to specify is CopyMinderNetworkPath. They can do this by specifying it on the command line (see below) or by creating the following lines in a .cm.ini file: [Main] CopyMinderNetworkPath=path where path is the full path to the chosen CopyMinder Network Path. WRc Ref: UC7578.01/09193-0 May 2011 © WRc plc 2011 91The general syntax for CMNet.exe is: CMNet [/I] [/N] [/Q] [/S] [/T] [/U] where /I specifies a CopyMinder Network Path. /N no icon is placed in the status bar. /Q display no success (or trivial warning) messages.. /S starts the program as a Windows service. /T terminate cmnet. /U uninstall cmnet service. In most instances, the /S, /T and /U parameters are the only ones they are likely to use (for example, CMNet /T). Any errors are recorded in a log file or, for the Windows service, in the Windows Event Log. Using CopyMinder Network Server When CMNet is running as an application, an icon will appear in the System Tray. By right-clicking on this icon, they can choose to terminate CMNet (the equivalent of running CMNet /T) or display a list of people currently using your program. C2.2 Installing Aardvark on individual workstations To install Aardvark on individual workstations, just work through these simple steps: 1. Install Aardvark on your PC following steps 1 to 7 in the installation instructions for the standalone version (Section C.2.1). 2. Once it is installed you will need to register the product the first time you use it. Locate the software on the program menu and start up Aardvark. You will be offered three options for Product Registration: • Configure as a standalone program • Configure as a trial • Configure as a network client 3. Select ‘Configure as a network client’ by checking the box and you will be prompted for the CopyMinder Network Path. Type, or paste, the CopyMinder Network Path provided by your Network Administrator into the box and click on ‘OK’. 4. Aardvark will start up automatically and is then ready for use.
