Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

random NaN values #839

Closed
evo11x opened this issue Jun 29, 2022 · 23 comments
Closed

random NaN values #839

evo11x opened this issue Jun 29, 2022 · 23 comments
Assignees
Labels
bug Something isn't working

Comments

@evo11x
Copy link

evo11x commented Jun 29, 2022

The latest version 2.0.2 introduces random NaN values when calculating more indicators one after another. I had to downgrade to 1.23 to get rid of this problem.

The NaN values appears at random locations and random indicator after 15-20 indicators calulated on a 40k candlestick chart. I am using .NET Framework 4.8

@evo11x evo11x added the bug Something isn't working label Jun 29, 2022
@DaveSkender
Copy link
Owner

DaveSkender commented Jun 29, 2022

Sounds like a bug with how you're describing it, or at least a problematic use of NaN.

Can you send me input quotes, output results, and indicator param info where this happens and is a problem -- so I can reproduce? I've not seen this happen in a problematic manner in my test suite or personal use cases and would like to capture it from you.

NaN values should only happen in rare cases where div/0 would otherwise happen or when chaining with null values in the body of the prior time series.

@evo11x
Copy link
Author

evo11x commented Jun 29, 2022

it's the btc chart on 5min and there are no zeros or null, just the chart is cut in a few places when the exchange was down.
https://data.binance.vision/?prefix=data/spot/daily/klines/BTCUSDT/5m/
but you have to join them for a history of 2 years.

It happens with random indicators after calculating about 40 indicators one after another, sometimes with BOP, sometimes ALMA and SMA.

With the older version 1.23 I don't get any NaN values.

@evo11x
Copy link
Author

evo11x commented Jun 29, 2022

You can also try with a smaller chart and remove a few candlesticks in random locations, I think the problem is because of the cut chart maybe?
Try to put 40-50 indicators on that chart (with a few candles removed) and check the values of the indicators for invalid numbers, NaN or inf.
I have exported the data in Python and there I have seen the NaN values, I am not sure if in C# are NaN or inf ?

@DaveSkender
Copy link
Owner

Is this just when using chained indicators, or for regular use (as before)?
Also, are you using the Python PyPI package or the C# NuGet package?

// chained example
var results = quotes
  .Use(CandlePart.Close)
  .GetSma(10);

// regular/direct example
var results = quotes
  .GetSma(10);

@evo11x
Copy link
Author

evo11x commented Jun 29, 2022

IEnumerable<SmaResult> sma50Results = quotes.GetSma(50);

@evo11x
Copy link
Author

evo11x commented Jun 29, 2022

I am using the the C# NuGet package, I just export the data to HDF5 and import into python.

@DaveSkender
Copy link
Owner

DaveSkender commented Jun 29, 2022

In theory, NaN is the appropriate thing to return in cases where a div/0 would occur. This can happen randomly in some of these indicators if there's a very specific and rare combination of calculations. I've been simply returning null in these cases, previously.

@mihakralj had convinced me that NaN is the better option, and it is on paper, makes sense to me too. To be honest, I may have not understood it well enough. If it's giving users a problem - in that it's too difficult to handle gracefully - I'll probably go back to the null convention.

I'll be able to do this by (note to self):

  • add unit tests to replicate the problem with data provided in URL above
  • revert to null in indicator calculations where double.NaN is used
  • replace NaN in chained SyncIndex() plugs, possibly rename this internal function to RestoreChain()

@DaveSkender
Copy link
Owner

DaveSkender commented Jun 29, 2022

You can also try with a smaller chart and remove a few candlesticks in random locations, I think the problem is because of the cut chart maybe? Try to put 40-50 indicators on that chart (with a few candles removed) and check the values of the indicators for invalid numbers, NaN or inf. I have exported the data in Python and there I have seen the NaN values, I am not sure if in C# are NaN or inf ?

Yah. This is what I hoped wouldn't happen. The library itself is not buggy, but your use case in the chart option just doesn't handle NaN well. I suspect there's a lot of cases where null just works better, without special handling. Technically speaking, you could just replace the double.NaN values with null in your results before putting it in your chart.

With that said, I'll stop using NaN to reduce friction with users.

@DaveSkender
Copy link
Owner

@evo11x can you narrow down where you’re getting this error, to some dates and the offending indicator? I’m not super interested in downloading and concatenating 500 individual text files. A little help is appreciated.

@evo11x
Copy link
Author

evo11x commented Jun 30, 2022

@DaveSkender I will try to export the exact chart from my serialized class, because I don't have a CSV yet

the indicators should not produce NaN or null values as long as the chart does not have null or zero values. Dealing with NaN or null values is not that easy, because you need to replace it with a value which can be good or bad for analysis, also it adds processing time.

Your library is currently the best because of many modern indicators included, the flexibility and easy to use also well documented

@DaveSkender
Copy link
Owner

the indicators should not produce NaN or null values as long as the chart does not have null or zero values. Dealing with NaN or null values is not that easy, because you need to replace it with a value which can be good or bad for analysis, also it adds processing time.

This is really an exercise in picking the least worst option. Incalculable periods in the timeline need to have some placeholder value if we want to keep our timeline length parity between input quotes and output results. The classic placeholder value for incalculable or unknown, without context, are null, NaN, or 0. Personally, I think null is the least worst option. Do you have any other recommendations?

Your library is currently the best because of many modern indicators included, the flexibility and easy to use also well documented

Thank you. ❤️

@evo11x
Copy link
Author

evo11x commented Jun 30, 2022

In my oppinion the best replacement for NaN/null would be the value near it and if there are two or more NaN/null values an average of the nearest values would be the best option.

10, 15, NaN could be 10,15,15
or 10,15,NaN,NaN, 20, 10 could be an average of the 15 and 20 like this 10,15,17.5, 17.5, 20

and the very best option and more complicated would be to create a gradient between the missing numbers like 5,NaN,NaN,8 to become 5,6,7,8 this can also help with any other indicators applied over these values

NaN, null or 0 betwen data is messing very badly with AI

@DaveSkender
Copy link
Owner

DaveSkender commented Jun 30, 2022

In my oppinion the best replacement for NaN/null would be the value near it and if there are two or more NaN/null values an average of the nearest values would be the best option.

Hmmm. Interesting idea, to use interpolation and possibly recursion. This is a very expensive operation and probably not something everyone wants (numerical guessing), so I’d not implement it as the default model, but can potentially see it as a utility cleaner for results. It’d definitely not work for incremental calculations and quote streaming use cases.

@evo11x
Copy link
Author

evo11x commented Jun 30, 2022

Here is a 15min chart with which I also get the NaN values at around index 20k, it is also smaller than the 5min
https://ufile.io/41ir48zu

@mihakralj
Copy link
Contributor

Replacing invalid numbers with interpolated numbers is usually not a good idea. I'd prefer we create 'cleansing' functions/indicators that purposefully do that on a dataset.

Alternatively, we could use one of optional parameters in indicators to tell how to handle invalid numbers: zero them out, Null them, NaN them or interpolate them.

@DaveSkender
Copy link
Owner

Replacing invalid numbers with interpolated numbers is usually not a good idea. I'd prefer we create 'cleansing' functions/indicators that purposefully do that on a dataset.

Agreed. I think a discretionary utility is best. I'm not a fan of complicating parameters for individual indicators.

@mihakralj
Copy link
Contributor

@DaveSkender separate set of normalizing utilities would be absolutely preferred. Cleansing, re-normalizing (squashing the dataset into [-1.0 : 1.0] range), de-spiking, inserting interpolated data into empty timespan slots... This is not new, time series analysis libraries have all such utilities defined, we just need to re-write them in C# (preferrably streaming-ready, not just backtesting-ready)

@DaveSkender
Copy link
Owner

Yep. I have a few new one's in mind to add to the utility kit.

@evo11x
Copy link
Author

evo11x commented Jul 1, 2022

Great! Having these options in the utility kit is perfect! I also don't know any other library which does this.

@DaveSkender DaveSkender mentioned this issue Jul 2, 2022
9 tasks
@DaveSkender DaveSkender self-assigned this Jul 2, 2022
@DaveSkender DaveSkender moved this from Maybe to In Progress in Stock Indicators for .NET Jul 2, 2022
@DaveSkender
Copy link
Owner

DaveSkender commented Jul 2, 2022

Here is a 15min chart with which I also get the NaN values at around index 20k, it is also smaller than the 5min

quotes: btcusdnan.csv

@evo11x what indicator and parameter arguments are you using with this data to get NaN? I'm not seeing any with quotes.GetSma(50).

results: btcusd-sma.csv

DaveSkender added a commit that referenced this issue Jul 2, 2022
@evo11x
Copy link
Author

evo11x commented Jul 2, 2022

IEnumerable alma9Results = quotes.GetAlma(9, 0.85, 6);
IEnumerable alma20Results = quotes.GetAlma(20, 0.85, 6);
IEnumerable rsiResults = quotes.GetRsi(14);
IEnumerable bop50Results = quotes.GetBop(50)
With Sma I think it was 7, I will try it again with the new version when I will have some time

@DaveSkender
Copy link
Owner

DaveSkender commented Jul 2, 2022

So far, BOP is the only one I can reproduce, the others work without producing NaN values. I'm using ~v2.0.2
With that said, I do see quite a few places where NaN is possible and am changing those to generate null instead.

@DaveSkender DaveSkender moved this from In Progress to Shippable in Stock Indicators for .NET Jul 3, 2022
Repository owner moved this from Shippable to Done in Stock Indicators for .NET Jul 3, 2022
@DaveSkender
Copy link
Owner

This was released in v2.0.3

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 2, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
Archived in project
Development

No branches or pull requests

3 participants