Skip to content

Buddy check

Cristian Lussana edited this page Mar 24, 2022 · 32 revisions

The buddy check compares an observation against its neighbours (i.e. buddies) and flags outliers.

The check looks for buddies in a neighbourhood specified by radius [m], which is the radius of a circle around the observation to be checked. A minimum number of observations (num_min) is required to be available inside the circle and the range of elevations in the circle must not exceed max_elev_diff meters . The number of iterations is set by num_iterations.

The buddy check flags observations if the (absolute value of the) difference between the observations and the average of the neighbours normalized by the standard deviation in the circle is greater than a predefined threshold. If the standard deviation of values in the neighbourhood is less than min_std, then a value of min_std is used instead. min_std should be roughly equal to the standard deviation of the error of a typical observation. If it is too low, then too many observations will be flaged in areas where the variability is low.

In the case of temperature, elevation differences should be taken into account because all observations are reported to the elevation of the centroid observation before averaging. A linear vertical rate of change of temperature can be set by elev_gradient. A recommended value is elev_gradient=-0.0065 °C/m (as defined in the ICAO international standard atmosphere). If max_elev_diff is negative then don't check elevation difference and do not correct the observed values.

It is possible to specify an optional vector obs_to_check to specify whether an observation should be checked. The length of obs_to_check must be the same as the vector with the values to check. The buddy check is performed only for values where the corresponding obs_to_check element is set to 1, while all values are always used as buddies for checking the data quality.

Python examples

radius = np.full(points.size(), 5000)
num_min = np.full(points.size(), 5)
threshold = 2
max_elev_diff = 200
elev_gradient = -0.0065
min_std = 1
num_iterations = 5

flags = titanlib.buddy_check(points, temp_obs, radius, num_min,threshold, max_elev_diff, elev_gradient, min_std, num_iterations)

R examples

# copy-and-paste from buddy_check_test.r
titanlib_path <- "../build"
dyn.load( file.path( titanlib_path, 
                     paste("swig/R/titanlib", .Platform$dynlib.ext, sep="")))
source(   file.path( titanlib_path,"swig/R/titanlib.R"))

#---------------------------------------------------------------
# Test with small vectors
P <- 11
lats <- rep( 60, P)
lons <- seq(10, length=P, by=0.005)
elevs <- rep(0, length(lats))
values <- round( 10*sin( lons * 2 * base::pi / ( max(lons) - min(lons))), 2)
# this is a bad observation
values[2]<-20
# check all obs
obs_to_check <- rep(1, length(lats))
# -- buddy check parameters --
radius <- 20000 # km
num_iterations <- 10
num_min <- 3 
threshold <- 2
max_elev_diff <- 100000 # m, if negative then elevation is not considered
elev_gradient <- -0.0065 # used only when max_elev_diff is positive
min_std <- 1
# call titanlib
points <- Points(lats, lons, elevs)
flags <- buddy_check(points, values, radius, num_min, threshold, max_elev_diff, elev_gradient, min_std, num_iterations)
print(flags)

#------------------------------------------
# Test with bigger vectors
P <- 50000
pGE <- 0.3 # probability of gross error 0.3 = 30%
lats <- runif(P, min = 55, max = 60)
lons <- runif(P, min = 5, max = 10)
elevs <- runif(P, min = 0, max = 2500)
# simple vertical profile
values <- 30 - 0.0065 * elevs
idx <- sample(1:P,ceiling(P*pGE))
true_flags <- values; true_flags[] <- 0
true_flags[idx] <- 1
values[idx]<-runif(ceiling(P*pGE), min = -50, max = 50)
obs_to_check = rep(1,P)
# buddy check parameters
radius <- 20000
num_iterations <- 10
num_min <- 3
threshold <- 2
max_elev_diff <- 100000
elev_gradient <- -0.0065
min_std <- 1
# call titanlib
points <- Points(lats, lons, elevs)
t0<-Sys.time()
flags <- buddy_check(points, values, radius, num_min, threshold, max_elev_diff, elev_gradient, min_std, num_iterations)
t1<-Sys.time()
print(t1-t0)
a <- length( which( true_flags == 1 & flags == 1)) # hit (correct bad)
c <- length( which( true_flags == 1 & flags == 0)) # miss (false good)
b <- length( which( true_flags == 0 & flags == 1)) # false bad
d <- length( which( true_flags == 0 & flags == 0)) # correct good
rand <- (a+c) * (a+b) / (a+b+c+d)
ets <- (a-rand) / (a+b+c-rand)
acc <- (a+d)/(a+b+c+d)
pod <- a/(a+c)
pofa <- b/(b+d)
cat( paste0( "TOT / corr.bad(true_bad) false_bad false_good  corr.good: ",a+b+c+d," / ",a," (",length(which( true_flags == 1)),") ",b," ",c," ",d,"\n"))
cat( paste("scores / acc pod pofa ets:", round(acc,2), round(pod,2), round(pofa,2), round(ets,2),"\n"))