-
Notifications
You must be signed in to change notification settings - Fork 4.8k
HIVE-27944:When HIVE-LLAP reads the ICEBERG table, a deadlock may occur. #4946
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@deniskuzZ Hi, I resubmitted my PR and made the changes you suggested. |
|
@deniskuzZ However, although this solves the problem, I have another doubt. If we use static cache, then the elements in SplitGroup.cache[] will never be deleted, when the user changes the partition, the elements here are not updated, so is there a problem? |
|
@zhangbutao Hi butao. I thought we could discuss this together. |
Yes, I think the static cache will never be cleaned for LLAP, so this maybe lead some issue. But it seems this cache has been in there for long long time (HIVE-8409 -> HIVE-9976), and i don't have deep knowledge about this code at present. I'd like to hear others opinions. |
yea. Currently, the solution I'm using in our production environment is to use a non-static cache. So far it's working fine and I haven't observed any OOM issues due to the large number of partitions. Incidentally, our ICEBERG table has close to 100,000 partitions. |
@zhangbutao Hi. butao. |
|
|
@deniskuzZ Hi. Can we discuss this issue further? I'm always worried that we'll forget about it over time. Tks. |
|
@deniskuzZ If it is indeed too late to discuss and review this patch in the near future, then please at least add it to the https://issues.apache.org/jira/browse/HIVE-27945 |
apologies @BsoBird, it's been a busy week, please give me some time to validate this. Note: we'll probably fast-forward branch-4 in Jan to include recent bug-fixes from master |
any noticeable performance degradation? |
Could we replace this map-based cache with caffeine cache? |
I think... no. At least now I think LLAP runs faster than spark. Maybe I haven't observed the impact of using non-static caching. |
I think we can not do that. Because if we just change it to caffeine, it will still trigger properties.equals(). |
That's great news, sir. |
Cache is currently reused between multiple tasks. Caffeine was proposed as a way to define an eviction policy. I am checking the code cause existing structure |
yea. i think so... very strange |
from HIVE-9976
is there an existing test that makes use of that |
|
in the whole project there is just single test that uses that cache and not from mvn test -Dtest=TestHiveFileFormatUtils#testGetPartitionDescFromPathRecursively we should add the q test that would trigger that part of code @BsoBird I think it's ok to remove |
So, are we going to go with the first version of the programme? |
|
Hi @deniskuzZ . I will reopen the previous PR if we are sure to go with the first version of the programme. |
Wow! Definitely a good catch! Remove |
|
@deniskuzZ @zhangbutao Actually, I think it would be ideal to provide a unified cache that all code that needs to use the partitioned cache should reference. The cache needs to maintain correctness. Similar to HIVE-16079. But this may require some extra work. We can fix this first and then introduce a jira that improves it. |

fix #4935