E & H Services is a privately held technology and research company whose business is narrow, but highly specialized area of service and consulting services in the analysis of persistent organic pollutants (POPs) and other hormonally active substances belonging to the group of endocrine disruptors. Part of the company is an accredited testing laboratory performing qualitative and quantitative analysis of organic compounds. The testing laboratory is affiliated to the methodical activity partner laboratories CRL (Community Reference Laboratory for Dioxins and PCBs in Feed and Food in Freiburg) to our clients gets the best professional security approaches.
The strategy and visions are to be understood within complex range of services. For full satisfaction of our customers, we have an ambition to stay as professional experts in variety branches:
- Data analysis – mainly applying robust methods, where classical statistics,
- Mechanical and civil engineering – mainly in reference to technological solutions,
- Technological solutions – emission measurements, control, technical actions, investment, in relation to persistent organic contaminants (POPs), and their related parameters,
- Diagnostic, consultants, chemists, chemical engineering,
- IT system development, analysis and system administration,
- Project management, using robust software systems, platform independent,
- Quality Control, Quality Assurance – mainly related to diagnostic, and technological applications,
- Promotion, training, dissemination – covering all fields, having complementary impact to our valuable clients.
Not all experts are currently employed in the company, but if there are chosen for our services, be sure there are the best available.
We are ready to improve our lives in bringing novel, distinctive solutions, supporting Environmental protection and Health status of inhabitants. Our goals are reached by unconventional ways, distinctive in finding things differently. We are thinking globally, acting locally, and using the Best Available Technologies (BATs). We are proud to settle a basement for our professionals in their individual professional and personal growth. We are organised and run by ideas, not by hierarchy.
Do we need data analysis?
Results from environmental monitoring are considered as the first part of overall assessment; they are represented by data. As shown in many studies, the demand for any serious conclusion is the usage of complex methods. A mere graphical representation is only a halfway to a valuable extraction of data value to yield information. For large scale data sets, different statistical approaches for univariate and multivariate data analysis have been used in many studies. However, high costs of environmental surveys result in small data sets. Hence, both classical (e.g. Horn’s) and alternative robust methods for small data sets are required. One of suitable methods is the gnostic approach that is well theoretically described and successfully applied in other scientific branches.
Why novel approach?
Below, you can find a demonstration of new approach how to get information from some data collection – in many cases – with small data sets, high uncertainty, and no normality. Data always needs appropriate analysis, almost statistic. Then, you may see several questions. Which data are useful for calculations? What information we can obtain? Why we should have any new approach? Here, we will attempt to answer these questions. There are many fundamental problems with standard statistical paradigms with analysis of environmental (health, industrial etc.) data samples.
We are trying to explain the reasons of alternative analysis at the beginning. The main reasons are:
a) the Central Limit theorem does not work in the cases of small data set,
b) data have not any distribution with a final mean and final variance.
The measurements of, e.g. environmental sources, have special character of data such as small amount of values with non-normal distribution, for successfully applying of classical statistical analysis. Due to uncertainty in data collection or individuals, huge variance, non-homogeneity and great errors, it is very difficult to obtain any reliable results. Picture above represents one of such appropriate distribution of data analysis (example of concentration DDT in the river) with no standard attribute, small amount of values, huge variance and far from normal distribution.
Alternative methods of analysis
The methods is based on Mathematical Gnostic Theory of uncertain data (MGT) have completely new approach in data analysis. The MGT is a non-statistical theory of uncertain data. Despite they have a different background, which give a similar outcomes and estimates as statistical methods and the central limit theorem for data satisfying even they give the identical results.
Here is small example shown in the table above, with the data samples and their distribution of probability. It is created by the method of cumulative Empirical Distribution Function (EDF) by using of relative frequency. The last column in the table represents the process, how to make the steps of function of cumulative distribution of probability for this example of data. The last column in the table represents the process how to make a steps of function of cumulative distribution of probability for this example of data. Pictures show results in form distribution of probability and histogram (graph of relative frequency).
Statistical display of probability and density
The series of pictures show the examples of construction functions of probability and density by using the generated data as normal Gaussian distribution, histogram and theoretical normal distribution of density for construction the confidence interval.
Gnostic distribution function (GNDF)
The GNDF represents the real distribution of probability and density as a continuous function on the small data sample as there was shown above.
What about real data?
Now we are showing you to the describing of the real data from processes of measurement real nature objects and phenomenon by the probability of distribution. The data which are impossible described by the standard statistical way always come from environment. They also usually come from the results of chemical or clinical practice. One of the modern technologies is passive samplers (SPMD), which is used to measurement contamination of soil, water and air. By using them we can measure the great complex of contaminants such as persistent organic pollutants (POP), metals and vary toxic substances. All of those measurements have a great weakness there is no chance to evaluate them by using the standard statistical methods. They are very biased however they are full of information! Some of the next examples represent such cases of contamination environment.
There are two kinds of distribution functions GLOBAL and LOCAL.
The estimating global distribution function is based on the idea, that a data set is homogeneous. It provides the complete view of the data and shows the inner structure of the data sample with as much detail as being required.
Local distribution emphasizes the fact, that it characterizes the data distribution even over a small subinterval of the data sample. Thus it is enabled to divide the data to groups or clusters independent of any other additional conditions.
It is always necessary to consider all data, although there are some values with “additional uncertainty” so called censoring of data in this case due to the Limit of Detection (LOD, left-censored). It may be serious problem especially on the data set of the natural environment. Let’s show, how it looks like in the case of contamination by TCDD. Let’s assume that in 106 values of TCDD they are 93 of LOD. Let’s make the three distributions: at first one there are all 106 valid values (black line) without any censored values, at the second there are only 13 valid values (blue line) and at the third there are 93 left-censored values (red line). The situation is just in blue one very pessimistic and in case of red line very optimistic. It is just a limit case, there is choice which one. More realistic case is on the second picture.
Testing of hypothesis
One of the most interesting features of Gnostic distribution functions is their comparison to each other and so meaningfully improves the testing methods such as test of hypothesis. That method is based on comparison to two distributions where one of them is declared as a zero hypothesis and the other is an alternative hypothesis. The outputs of those testing are statistical criteria such as significance level (alpha), the error of second kind (beta) and power of test (1-beta). These calculated values come from intersections of distribution functions. The steps of test are on the series of pictures and for illustrative purpose are curves generated as a normal distribution by the random generator of numbers with variable of mean value. (N(30,2.5,0.5), N(30,2.8,0.5), N(30,3.4,0.5), N(30,4,0.5))
This is an example from clinical practice, when often arise the questions: How to compare the results of clinical analysis, where the values just cover almost the same interval? What is inside of the data file? The answers there are on the next pictures. The first one describes the way of intersections like in the testing. We can easy to see their ratios. The second one describes even a four of independent distributions and their intersections in fact.
Unlike a small data set, the huge data set may be with similar features such as great variance, non-homogenous and not normal distribution. Such data sets there are usually data as time series, values in some time of the period. Their numbers are large, but their evaluation is also problematic. Thanks the methods described above it is possible those data effectively evaluate. The method calculates the bounds on determined intervals (lower bound, upper bound and median) and describes so entire file of values. How does it work? We take the interval of values (for instance 10 values following after itself) and by moving this interval with overlap (5 values) calculate by using GNDF the bounds of particular interval (its distribution function) for the entire data file and this way we get a clear view all of data very easily.