Price sharing - price sanity check algorithms


Compatible with release
Supported languages
Price sharing - price sanity check algorithms. Discussion thread.

There are quite a few online resources that collect/crowdsource and distribute/share price data. In the various entering or collecting processes there are situations where bad/invalid data gets captured.

I am interested in discussing algorithms for the cleaning, verifying or sanity checking price data.

For each commodity, I guess there is a sensible range for the buy price and the sell price. The idea of course is to eliminate/ignore prices that are outside these ranges. But what methodology to we use to establish the valid ranges? How do we know if a price makes sense? Assuming we have alot of data in a database, we can find the average buy/sell, the min/max buy/sell. We can get the standard deviation too - does it mean anything in this context to use in establishing a valid range?

If we decide what a valid range is for a buy price or a sell price, do we delete everything outside that range OR do we delete one value (the Max or Min) and then recalculate the range?

Firstly - if the buy is 0 and the sell is 0 then the price is ignored/deleted. I will post some algorithms I have considered in another msg. What do others use?



  • Apr 27, 2015 (wolverine2710) First revision