-
Notifications
You must be signed in to change notification settings - Fork 249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
random NaN values #839
Comments
Sounds like a bug with how you're describing it, or at least a problematic use of Can you send me input quotes, output results, and indicator param info where this happens and is a problem -- so I can reproduce? I've not seen this happen in a problematic manner in my test suite or personal use cases and would like to capture it from you.
|
it's the btc chart on 5min and there are no zeros or null, just the chart is cut in a few places when the exchange was down. It happens with random indicators after calculating about 40 indicators one after another, sometimes with BOP, sometimes ALMA and SMA. With the older version 1.23 I don't get any NaN values. |
You can also try with a smaller chart and remove a few candlesticks in random locations, I think the problem is because of the cut chart maybe? |
Is this just when using chained indicators, or for regular use (as before)? // chained example
var results = quotes
.Use(CandlePart.Close)
.GetSma(10);
// regular/direct example
var results = quotes
.GetSma(10); |
|
I am using the the C# NuGet package, I just export the data to HDF5 and import into python. |
In theory, @mihakralj had convinced me that I'll be able to do this by (note to self):
|
Yah. This is what I hoped wouldn't happen. The library itself is not buggy, but your use case in the chart option just doesn't handle With that said, I'll stop using |
@evo11x can you narrow down where you’re getting this error, to some dates and the offending indicator? I’m not super interested in downloading and concatenating 500 individual text files. A little help is appreciated. |
@DaveSkender I will try to export the exact chart from my serialized class, because I don't have a CSV yet the indicators should not produce NaN or null values as long as the chart does not have null or zero values. Dealing with NaN or null values is not that easy, because you need to replace it with a value which can be good or bad for analysis, also it adds processing time. Your library is currently the best because of many modern indicators included, the flexibility and easy to use also well documented |
This is really an exercise in picking the least worst option. Incalculable periods in the timeline need to have some placeholder value if we want to keep our timeline length parity between input quotes and output results. The classic placeholder value for incalculable or unknown, without context, are
Thank you. ❤️ |
In my oppinion the best replacement for NaN/null would be the value near it and if there are two or more NaN/null values an average of the nearest values would be the best option. 10, 15, NaN could be 10,15,15 and the very best option and more complicated would be to create a gradient between the missing numbers like 5,NaN,NaN,8 to become 5,6,7,8 this can also help with any other indicators applied over these values NaN, null or 0 betwen data is messing very badly with AI |
Hmmm. Interesting idea, to use interpolation and possibly recursion. This is a very expensive operation and probably not something everyone wants (numerical guessing), so I’d not implement it as the default model, but can potentially see it as a utility cleaner for results. It’d definitely not work for incremental calculations and quote streaming use cases. |
Here is a 15min chart with which I also get the NaN values at around index 20k, it is also smaller than the 5min |
Replacing invalid numbers with interpolated numbers is usually not a good idea. I'd prefer we create 'cleansing' functions/indicators that purposefully do that on a dataset. Alternatively, we could use one of optional parameters in indicators to tell how to handle invalid numbers: zero them out, Null them, NaN them or interpolate them. |
Agreed. I think a discretionary utility is best. I'm not a fan of complicating parameters for individual indicators. |
@DaveSkender separate set of normalizing utilities would be absolutely preferred. Cleansing, re-normalizing (squashing the dataset into [-1.0 : 1.0] range), de-spiking, inserting interpolated data into empty timespan slots... This is not new, time series analysis libraries have all such utilities defined, we just need to re-write them in C# (preferrably streaming-ready, not just backtesting-ready) |
Yep. I have a few new one's in mind to add to the utility kit. |
Great! Having these options in the utility kit is perfect! I also don't know any other library which does this. |
quotes: btcusdnan.csv @evo11x what indicator and parameter arguments are you using with this data to get results: btcusd-sma.csv |
IEnumerable alma9Results = quotes.GetAlma(9, 0.85, 6); |
So far, BOP is the only one I can reproduce, the others work without producing |
This was released in v2.0.3 |
The latest version 2.0.2 introduces random NaN values when calculating more indicators one after another. I had to downgrade to 1.23 to get rid of this problem.
The NaN values appears at random locations and random indicator after 15-20 indicators calulated on a 40k candlestick chart. I am using .NET Framework 4.8
The text was updated successfully, but these errors were encountered: