Alluxio cache by Pluies · Pull Request #18719 · trinodb/trino

Pluies · 2023-08-17T15:48:28Z

Description

👋

This PR includes Alluxio caching into Trino. It is a reworking of #16375 with:

Fully rebased on current code
Fixed guice bindings / protobuf definitions / other small issues
Added Delta Lake support to Alluxio cache
Using optimized-local-scheduling rather than introducing a new concept of node affinity
Passing the lastModifiedTime from the coordinator to the workers via ConnectorSplit to allow for immutable caching without workers having to maintain their own file status cache
Full smoke tests for the caching file system by extending TestDeltaLakeMinioAndHmsConnectorSmokeTest into TestCachingDeltaLakeMinioAndHmsConnectorSmokeTest

Additional context and related issues

This is very much #16375 from @beinan reworked after helpful feedback from @raunaqmorarka (thank you! 🙏 ).

I'm putting it out there so we can discuss next steps on integrating this to Trino 🥳 with the following notes:

This PR only works with HDFS so far, and we'll have to make it compatible with the new S3-native implementation in a subsequent PR
Using optimized-local-scheduling means each connector has to implement split scheduling by specifying an address on the splits it generates. Otherwise, each split will randomly be assigned to a worker node, and the cache won't be distributed.
Connectors can now optionally pass a lastModifiedTime argument when creating a TrinoInputFile. This argument can be used by Alluxio to bypass checking the lastModifiedTime from the underlying file system, and use the cached file directly if available
Both of the above are enabled for trino-delta-lake, the connector I'm developing this for, but will need to be implemented separately for other connectors to take full advantage of Alluxio

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:

# Delta Lake
* Improve performance of scans by adding the ability to cache data files on local SSDs ({issue}`18719`)

lib/trino-hdfs/src/main/java/io/trino/filesystem/hdfs/cache/CachingFileSystemConfig.java

core/trino-main/src/main/java/io/trino/execution/scheduler/NodeSchedulerConfig.java

lib/trino-hdfs/src/main/java/io/trino/filesystem/hdfs/CachingHdfsInputFile.java

lib/trino-hdfs/src/main/java/io/trino/filesystem/hdfs/cache/CachingFileSystemConfig.java

Pluies · 2023-08-29T14:04:36Z

@electrum @beinan hey folks 👋 I'd love to get this merged, let me know if you'd like more context to review this PR!

rwilliams-r7 · 2023-08-30T09:30:54Z

would also love to see this or 16375 merged.

lib/trino-hdfs/src/main/java/io/trino/filesystem/hdfs/CachingHdfsInputFile.java

lib/trino-hdfs/src/main/java/io/trino/hdfs/util/ConsistentHashingNodeProvider.java

jkylling · 2023-09-07T12:06:43Z

lib/trino-hdfs/src/main/java/io/trino/filesystem/hdfs/CachingHdfsInputFile.java

Since we use soft scheduling, this might end up unnecessarily caching data on nodes whenever soft scheduling fails. That is, if a piece of data A is supposed to be cached on node 1, but node 1 is too busy, it might be scheduled to run on node 2. Then node 2 will read A from cloud storage, and add it to its cache. However, the next time data A is supposed to be read, and node 1 is still too busy, we are just as likely to schedule the read to happen on node 3. So we will just end up consuming write capacity and cache capacity from node 2, without really getting any benefit of caching. This will especially be a problem in a cluster with a lot of nodes.

The nodes need a way to know if they are supposed to cache a piece of data or not, and fall back to not caching.

That might be possible to mitigate by populating a list of preferred host addresses in the connector split instead of just one host

Would this potentially reduce the cache hit rate? For instance, let's say we have 100 nodes, each with a 1 TB cache, a 100 TB of data we want to query, and all data is only cached on a single node. If soft scheduling never fails we will eventually have a 100% cache hit rate. If soft scheduling fails 10% of the time, 10% of the data will be scheduled on a random node not supposed to cache this data, and store "garbage" in its cache. Eventually 10% of all cached data is not cached on a node with the correct cache key. This should give roughly an 80% cache hit rate.

If on the other hand we cache all data on two nodes, then only 50 TB of the data we want to query can be cached. This should give a cache hit rate of 50%, even if soft scheduling never fails.

If nodes only cache data they are supposed to cache, we would get an 90% cache hit rate, if soft scheduling fails 10% of the time.

I don't think this is a problem as long as you deterministically provide the same list of preferred host addressees for a given split and the scheduler attempts to schedule splits on these hosts in the provided preferred order. I'm assuming that the probability that all the preferred hosts are too busy to scheduled splits is low with 3 preferred hosts. You might cache some data in multiple nodes but that would be useful when the primary preferred node is too busy.
A couple of alternatives to this are:

The worker node checks that it is not the preferred host from the split and uses the non-cached file system implementation when it's not. This would give up on caching if the preferred host is too busy to schedule splits on.

The embedded cache implementation itself has the the ability to remotely read cached data from any worker. This way even if a split gets scheduled on a node which didn't cache the data, it can still read the cached data from another node. This is the approach that Rubix takes. However, this is probably not worth the added complexity.

lib/trino-hdfs/src/main/java/io/trino/hdfs/util/NodeProvider.java

jkylling · 2023-09-07T12:18:30Z

lib/trino-hdfs/src/main/java/io/trino/filesystem/hdfs/CachingHdfsInputFile.java

It's a bit unfortunate that we have to do the caching on the file level, and not on the part-of-file level. With the current caching scheme a single large file from cloud storage will be cached on a single node. For a large and hot file this could potentially lead to problems. But it might be hard to solve, and might not be worth it.

@beinan could you please weigh in on whether this is possible to improve ?

Pluies · 2023-09-20T10:08:48Z

An update on this here - based on the feedback above, we're reworking the PR to:

Make it compatible with fs.native-s3
Remove the inheritance on Hadoop internals like HdfsInputFile
Add proper tests that confim the cache is properly used when enabled
Generalise the caching scaffolding so that other cache engines can be plugged in besides Alluxio
In general, make it much nicer 😅

We're planning to polish it a bit internally, then bring it up here for discussion probably next week.

rwilliams-r7 · 2023-09-21T11:43:09Z

Amazing thank you for this work.

jkylling · 2023-09-29T14:11:45Z

Hey, a small update here. We have just pushed our latest changes with the promised refactor. There is likely still some work to do, especially around testing. Still, we hope this code structure can be a good point for continuing with the review.

To summarize some of the changes and open questions:

The code is refactored to work with any TrinoFileSystem.
We have added some test coverage, but it's still a bit rough. We plan to polish this more and take into account changes to tests which have happened since we started on the refactor.
The AlluxioFileSystemCache is part of the trino-hdfs module, but it is really an independent component and could be extracted.
We've added a working ConsistentHashingNodeProvider.
The ConsistentHashingNodeProvider uses com.github.ishugaliy.allgood-consistent-hash through the com.qubole.rubix.rubix-presto-shaded dependency. This is a bit unfortunate, but we are unsure on the best approach here.
In the current implementation, the node provider (determines where split goes) and the caching (determines what get's cached, and caches it) are completely unrelated components. This makes the first suggestion in Alluxio cache #18719 (comment) difficult to implement.
We no longer set lastModifiedTime on the splits from the ConnectorSplitManager as discussed in Alluxio cache #18719 (comment) The reasoning for this is that the connector typically knows which files are immutable and which are not, but the connector need not always know the lastModifiedTime of the files. This is for instance the case when reading checkpoint files from the Delta log of a Delta table. These files are immutable (from the point of view of a Delta Lake reader which first reads the _last_checkpoint file), but the code paths which read the checkpoint files do not all have access to the lastModifiedTime. It seems simpler to have the connector define which files are cacheable or not. This will enable caching of checkpoint files. It also makes it simpler to define custom caching strategies in plugins.
The current implementation of DeltaLakeAlluxioFileSystemCache does not cache checkpoints, since the CheckpointEntryIterator is not always closed (first steps towards a fix was started in a6711fa).
The way we handle configuration can likely be improved. Currently we pass most of the configuration through a Guice MapBinder (to pick the correct TrinoFileSystemCache) and conditional modules (to set up TrinoFileSystemCache specific bindings). Are there other patterns which can be used here?

If anyone wants to test this PR with other connectors, please be aware that the ConnectorSplit must be modified to include a non-trivial getAddresses method, similar to what is done in ae2f822

electrum · 2023-10-04T02:18:07Z

Thanks for your work on this. Can you move AlluxioFileSystemCache to a new module trino-filesystem-alluxio and remove the dependencies on Hadoop? We don't want to introduce new Hadoop dependencies or require users to have to enable Hadoop.

Is the Alluxio cache code fundamentally tied to Hadoop? I see alluxio.hadoop.HdfsFileInputStream which seems to wrapper a Hadoop stream. My understanding based on discussion with @beinan is that they have a lower level cache API that doesn't depend on Hadoop.

jkylling · 2023-10-05T10:15:45Z

Thanks for your work on this. Can you move AlluxioFileSystemCache to a new module trino-filesystem-alluxio and remove the dependencies on Hadoop? We don't want to introduce new Hadoop dependencies or require users to have to enable Hadoop.

Is the Alluxio cache code fundamentally tied to Hadoop? I see alluxio.hadoop.HdfsFileInputStream which seems to wrapper a Hadoop stream. My understanding based on discussion with @beinan is that they have a lower level cache API that doesn't depend on Hadoop.

I've moved it to a new module trino-filesystem-alluxio and removed the dependencies on Hadoop. We now only rely on the Alluxio FileInStream and LocalCacheFileInStream classes, which does not seem to rely on Hadoop within Alluxio. It was necessary to duplicate some of the functionality of HdfsInput to have the new AlluxioInput class implement TrinoInput. I copy pasted most of io.trino.hdfs.FSDataInputStreamTail to implement readTail, and some internal Hadoop code to implement readFully.

wendigo · 2023-10-05T11:36:57Z

@jkylling can you rebase to resolve conflict?

jkylling · 2023-10-05T12:27:12Z

@jkylling can you rebase to resolve conflict?

Yes, I'm looking at it. It's a bit tricky as the alluxio-shaded client contains a lot of classes which can overlap with other dependencies, and rebasing on master have brought about a new set of duplicate resources. I'm trying to switch to use non-shaded Alluxio libraries instead.

wendigo · 2023-10-05T12:32:54Z

@jkylling we should use shaded version if possible

mosabua

Just a few suggestions for the javadocs while I write on documentation PR.

...em-cache-alluxio/src/main/java/io/trino/filesystem/alluxio/AlluxioFileSystemCacheConfig.java

amoghmargoor · 2024-01-16T15:32:28Z

Btw this is the extension of this PR for Iceberg connector incase somebody is interested: amoghmargoor@cdefc91

mosabua · 2024-01-16T18:33:42Z

Btw this is the extension of this PR for Iceberg connector incase somebody is interested: amoghmargoor@cdefc91

We definitely want to get this into the project as soon as we can as well. Ideally also for Hive and Hudi .. but best to do it after this PR merges. Hopefully soon.

amoghmargoor · 2024-01-17T15:54:19Z

Btw this is the extension of this PR for Iceberg connector incase somebody is interested: amoghmargoor@cdefc91

We definitely want to get this into the project as soon as we can as well. Ideally also for Hive and Hudi .. but best to do it after this PR merges. Hopefully soon.

Makes sense. Looking forward. Thanks!

rwilliams-r7 · 2024-01-18T15:08:00Z

thanks for all the hard work on this, we look forward to this being ready and pushed in. This will be a huge feature that is needed.

jkylling · 2024-01-31T11:14:25Z

lgtm % comments Are you able to share any benchmark results ?

We've been running with an earlier iteration of this PR for a while now, and see equal or better performance, depending on the volumes used for caching. With EBS we have seen equal performance, while with NVMe disks on the nodes we have seen maybe a 10%-20% speed up on some benchmark workloads (@Pluies has the exact numbers). Typically we see worse performance when the cache is populated, and then slightly better performance when the cache is warm.

I did a a very unscientific benchmark with a build using the latest iteration of this PR and got the numbers below (1 coordinator + 2 workers, m6g.8xlarge, 30 CPU, 115 Gi, each with a 2000 Gi EBS volume for caching):

trino> create table delta.test.sf10000customer as select * from tpch.sf10000.customer;
CREATE TABLE: 1500000000 rows

Query 20240131_105105_00011_bsrke, FINISHED, 3 nodes
Splits: 289 total, 289 done (100.00%)
3:16 [1.5B rows, 0B] [7.64M rows/s, 0B/s]
trino> 
trino> select checksum(address || comment || name) from delta.test.sf10000customer;
          _col0          
-------------------------
 b1 c1 8c d1 5f bb 41 32 
(1 row)

Query 20240131_105434_00012_bsrke, FINISHED, 2 nodes
Splits: 2,052 total, 2,052 done (100.00%)
38.15 [1.5B rows, 84.4GB] [39.3M rows/s, 2.21GB/s]

trino> select checksum(address || comment || name) from delta_no_cache.test.sf10000customer;
          _col0          
-------------------------
 b1 c1 8c d1 5f bb 41 32 
(1 row)

Query 20240131_105518_00013_bsrke, FINISHED, 2 nodes
Splits: 2,052 total, 2,052 done (100.00%)
33.73 [1.5B rows, 84.4GB] [44.5M rows/s, 2.5GB/s]

trino> select checksum(address || comment || name) from delta.test.sf10000customer;
          _col0          
-------------------------
 b1 c1 8c d1 5f bb 41 32 
(1 row)

Query 20240131_105555_00014_bsrke, FINISHED, 2 nodes
Splits: 2,052 total, 2,052 done (100.00%)
21.44 [1.5B rows, 84.4GB] [70M rows/s, 3.94GB/s]

trino> select checksum(address || comment || name) from delta.test.sf10000customer;
          _col0          
-------------------------
 b1 c1 8c d1 5f bb 41 32 
(1 row)

Query 20240131_105620_00015_bsrke, FINISHED, 2 nodes
Splits: 2,052 total, 2,052 done (100.00%)
22.48 [1.5B rows, 84.4GB] [66.7M rows/s, 3.75GB/s]

trino> select checksum(address || comment || name) from delta.test.sf10000customer;
          _col0          
-------------------------
 b1 c1 8c d1 5f bb 41 32 
(1 row)

Query 20240131_105649_00016_bsrke, FINISHED, 2 nodes
Splits: 2,052 total, 2,052 done (100.00%)
21.72 [1.5B rows, 84.4GB] [69.1M rows/s, 3.88GB/s]

trino> select checksum(address || comment || name) from delta_no_cache.test.sf10000customer;
          _col0          
-------------------------
 b1 c1 8c d1 5f bb 41 32 
(1 row)

Query 20240131_105715_00017_bsrke, FINISHED, 2 nodes
Splits: 2,052 total, 2,052 done (100.00%)
33.76 [1.5B rows, 84.4GB] [44.4M rows/s, 2.5GB/s]

trino> select checksum(address || comment || name) from delta_no_cache.test.sf10000customer;
          _col0          
-------------------------
 b1 c1 8c d1 5f bb 41 32 
(1 row)

Query 20240131_105752_00018_bsrke, FINISHED, 2 nodes
Splits: 2,052 total, 2,052 done (100.00%)
32.92 [1.5B rows, 84.4GB] [45.6M rows/s, 2.56GB/s]

That is, with a cold cache, the scan of 84 GB takes ~38 seconds, with a warm cache it takes ~ 22 seconds, and without a cache it takes ~33 seconds.

raunaqmorarka · 2024-01-31T11:37:30Z

Please rebase to latest master as well

...em-cache-alluxio/src/main/java/io/trino/filesystem/alluxio/AlluxioFileSystemCacheModule.java

raunaqmorarka · 2024-01-31T15:12:15Z

...em-cache-alluxio/src/main/java/io/trino/filesystem/alluxio/AlluxioFileSystemCacheConfig.java

+    private DataSize cachePageSize = DataSize.valueOf("1MB");
+
+    @NotNull
+    public List<@FileExists String> getCacheDirectories()


I'm not sure that checking for existence here is correct as a previously working config fails now.

2024-01-31T14:51:10.4367179Z 2024-01-31T14:51:06.347Z ERROR main io.trino.server.Server Configuration errors: 2024-01-31T14:51:10.4367195Z 2024-01-31T14:51:10.4368303Z 1) Error: Invalid configuration property with prefix '': file does not exist: /opt/data (for class AlluxioFileSystemCacheConfig.cacheDirectories[0].<list element>) 2024-01-31T14:51:10.4368323Z 2024-01-31T14:51:10.4368437Z 1 error 2024-01-31T14:51:10.4368443Z 2024-01-31T14:51:10.4368577Z ====================== 2024-01-31T14:51:10.4368712Z Full classname legend: 2024-01-31T14:51:10.4368848Z ====================== 2024-01-31T14:51:10.4369391Z AlluxioFileSystemCacheConfig: "io.trino.filesystem.alluxio.AlluxioFileSystemCacheConfig" 2024-01-31T14:51:10.4369533Z ======================== 2024-01-31T14:51:10.4369672Z End of classname legend: 2024-01-31T14:51:10.4369802Z ======================== 2024-01-31T14:51:10.4369808Z 2024-01-31T14:51:10.4370288Z io.airlift.bootstrap.ApplicationConfigurationException: Configuration errors: 2024-01-31T14:51:10.4370378Z 2024-01-31T14:51:10.4371486Z 1) Error: Invalid configuration property with prefix '': file does not exist: /opt/data (for class AlluxioFileSystemCacheConfig.cacheDirectories[0].<list element>) 2024-01-31T14:51:10.4371493Z 2024-01-31T14:51:10.4371626Z 1 error 2024-01-31T14:51:10.4371631Z 2024-01-31T14:51:10.4371749Z ====================== 2024-01-31T14:51:10.4371898Z Full classname legend: 2024-01-31T14:51:10.4372014Z ====================== 2024-01-31T14:51:10.4372575Z AlluxioFileSystemCacheConfig: "io.trino.filesystem.alluxio.AlluxioFileSystemCacheConfig" 2024-01-31T14:51:10.4372700Z ======================== 2024-01-31T14:51:10.4372854Z End of classname legend: 2024-01-31T14:51:10.4372988Z ======================== 2024-01-31T14:51:10.4372994Z 2024-01-31T14:51:10.4373348Z at io.airlift.bootstrap.Bootstrap.configure(Bootstrap.java:217) 2024-01-31T14:51:10.4373726Z at io.airlift.bootstrap.Bootstrap.initialize(Bootstrap.java:246) 2024-01-31T14:51:10.4374592Z at io.trino.plugin.deltalake.InternalDeltaLakeConnectorFactory.createConnector(InternalDeltaLakeConnectorFactory.java:112) 2024-01-31T14:51:10.4375225Z at io.trino.plugin.deltalake.DeltaLakeConnectorFactory.create(DeltaLakeConnectorFactory.java:56)

Does alluxio create the directory automatically if it doesn't exist ?
All tests are green even though Trino won't start with alluxio cache configuration right now. That's why we need an end-to-end test in this PR.

Alluxio will create the directory if it does not exist. I've updated the validation mechanism to check if any parent directory is writable.

I've added a product test which tests the configuration and the caching.

I encountered some problems when running the product tests. The Trino nodes launched by the product tests crashed with an error message saying that Java was not installed. The root cause turned out to be that the JDK downloaded by the product test to /tmp/ptl-tmp-download/temurin21-amd64 was corrupted and missing the library libjli.so. Deleting /tmp/ptl-tmp-download solved the issue. I'm not exactly sure how this state came about, but thought I'd write it down for posterity.

The error message on java.nio.file.NoSuchFileException: /opt/data/LOCAL seems to come from Alluxio's TTL mechanism. From what I can tell, this code will always throw when encountering an empty cache. It's not entirely clear if it's innocent or not, as it might cause the scheduleAtFixedRate to stop executing.

raunaqmorarka

You can squash last two commits

raunaqmorarka · 2024-02-01T05:15:31Z

...rino-delta-lake/src/main/java/io/trino/plugin/deltalake/cache/DeltaLakeCacheKeyProvider.java

Is the TODO "CheckpointEntryIterator is always closed" still pending ? If it is, please include a link to an open GH issue

It should be resolved by #20054 now. I'll enable the caching of checkpoint files.

Please make sure we include some test with checkpoint files in delta if we're doing that in this PR.

The TestDeltaLakeAlluxioCacheFileOperations does have coverage for the cache operations for checkpoint files, but it's unclear how much coverage on checkpoints we get through the TestDeltaLakeAlluxioCacheMinioAndHmsConnectorSmokeTest test. It might be prudent to just leave it as is, and consider it as a future optimization. I'll keep the code as is, and leave a TODO to enable caching of checkpoint files.

Enabling caching of checkpoint files could make the delta.checkpoint-filtering.enabled=true feature work better. When we tried enabling this feature earlier we saw a major slow down of the planning and analysis phases. The in-memory cache of checkpoints seemed to no longer be used, and lots of queries were made by the coordinator to object storage to fetch checkpoint files. However, if we only need to read the checkpoint files from disk it could work without the in-memory checkpoint cache.

...n/java/io/trino/tests/product/launcher/env/environment/EnvMultinodeMinioDataLakeCaching.java

...roduct-tests/src/main/java/io/trino/tests/product/deltalake/TestDeltaLakeAlluxioCaching.java

raunaqmorarka · 2024-02-01T05:40:40Z

Ran SF1000 TPC benchmark on 6 r6gd.8xlarge workers
Alluxio delta partitioned TPC.pdf
Alluxio delta unpartitioned TPC.pdf

Partitioned TPC summary

Results on TPC look pretty good, there is significant reduction in wall time and some CPU time reduction.

As a follow-up we should look at exposing bytes read from cache as a connector metric, this will make it easy to see usage of the cache for each table scan in a query in output of EXPLAIN ANALYZE VERBOSE, queryinfo json, event listener metrics etc.

wendigo · 2024-02-01T11:48:13Z

I'm going to merge Rubix drop in a while (#20102), so we can rebase and drop the conflict resolution commit from this PR.

wendigo · 2024-02-01T18:15:56Z

@jkylling please rephrase Rubix commit. We've decided internally to remove Rubix when we implement Hive caching with Alluxio. We will merge this PR and add Hive support in a separate change

Co-authored-by: Florent Delannoy <florent.delannoy@gmail.com>

electrum

See comments, otherwise looks good

electrum · 2024-02-02T02:36:21Z

pom.xml

+                <groupId>org.alluxio</groupId>
+                <artifactId>alluxio-core-common</artifactId>
+                <version>${dep.alluxio.version}</version>
+                <exclusions>


Can we file an issue to fix these in Alluxio?

electrum · 2024-02-02T02:40:53Z

...ta-lake/src/test/java/io/trino/plugin/deltalake/TestDeltaLakeAlluxioCacheFileOperations.java

+        DistributedQueryRunner queryRunner = DistributedQueryRunner.builder(session)
+                .build();
+        try {
+            File metastoreDirectory = queryRunner.getCoordinator().getBaseDataDir().resolve("delta_lake_metastore").toFile().getAbsoluteFile();


We're trying to migrate the testing code so that

we create connectors using properties rather than hand wiring

we use OpenTelemetry tracing rather than custom tracking code

For example, 458bfd7 replaced a custom metastore wrapper with a getSpans() method on DistributedQueryrunner (take a look at the assertMetastoreInvocationsForQuery() utility method). If we can do a similar thing here, the test construction becomes simpler and easier to maintain, and we know that we're testing the same code that runs in production.

I'd like to remove TrackingFileSystemFactory, so it's best if we don't introduce more usages of it.

electrum · 2024-02-02T02:45:08Z

plugin/trino-delta-lake/pom.xml

            <scope>runtime</scope>
        </dependency>

+        <dependency>


If we can convert TestDeltaLakeAlluxioCacheFileOperations to construct the connector with properties and use tracing (per my other comment), then we should be able to remove these runtime dependencies.

electrum · 2024-02-02T02:46:54Z

...trino-filesystem/src/main/java/io/trino/filesystem/cache/NoneCachingHostAddressProvider.java

+
+import java.util.List;
+
+public class NoneCachingHostAddressProvider


I think NoCachingHostAddressProvider would sound better

electrum · 2024-02-02T02:54:50Z

lib/trino-filesystem-manager/src/main/java/io/trino/filesystem/manager/FileSystemModule.java

            factories.addBinding("gs").to(GcsFileSystemFactory.class);
        }
+
+        newOptionalBinder(binder, CachingHostAddressProvider.class).setDefault().to(NoneCachingHostAddressProvider.class).in(Scopes.SINGLETON);


Rather than using an optional binder here, the configured cache implementation should bind the implementation. So the NONE cache should install NoCachingHostAddressProvider and ALLUXIO should install ConsistentHashingHostAddressProvider.

electrum · 2024-02-02T02:58:22Z

lib/trino-filesystem/src/main/java/io/trino/filesystem/cache/DefaultCacheKeyProvider.java

+    public Optional<String> getCacheKey(TrinoInputFile delegate)
+            throws IOException
+    {
+        return Optional.of(delegate.location().path() + delegate.lastModified());


This should use a separator that won't appear in file names. Otherwise, we could have a collision with a filename ending in a number.

electrum · 2024-02-02T02:59:01Z

...rino-delta-lake/src/main/java/io/trino/plugin/deltalake/cache/DeltaLakeCacheKeyProvider.java

+    public Optional<String> getCacheKey(TrinoInputFile delegate)
+    {
+        // TODO: Consider caching of the Parquet checkpoint files within _delta_log
+        if (!delegate.location().path().contains("/_delta_log/")) {


Why do we skip caching this directory?

The _delta_log directory contains the files _last_checkpoint and _trino_meta/extended_stats.json. These are not immutable, so are tricky to cache. Also the commit files of the form 000...123.json might not be immutable on ABFS. The checkpoint files should be immutable when accessed by Trino. We decided to leave it as a future optimization in #18719 (comment)

mosabua · 2024-02-06T19:27:53Z

@colebow .. release notes entry should link to the docs and maybe just say

Add support for filesystem caching

Same for the other incoming PRs for Iceberg, Hive, and Hudi

cla-bot bot added the cla-signed label Aug 17, 2023

github-actions bot added tests:hive delta-lake Delta Lake connector hive Hive connector labels Aug 17, 2023

raunaqmorarka reviewed Aug 18, 2023

View reviewed changes

lib/trino-hdfs/src/main/java/io/trino/filesystem/hdfs/cache/CachingFileSystemConfig.java Outdated Show resolved Hide resolved

raunaqmorarka reviewed Aug 18, 2023

View reviewed changes

core/trino-main/src/main/java/io/trino/execution/scheduler/NodeSchedulerConfig.java Outdated Show resolved Hide resolved

raunaqmorarka mentioned this pull request Aug 22, 2023

Delta Connector should support the Hive Cache #18723

Closed

Pluies force-pushed the alluxio-cache branch from 798a74f to 07e688b Compare August 23, 2023 12:40

raunaqmorarka requested review from beinan and electrum August 23, 2023 13:00

raunaqmorarka reviewed Aug 28, 2023

View reviewed changes

Pluies force-pushed the alluxio-cache branch 4 times, most recently from 3325fd5 to 659298d Compare August 29, 2023 13:23

jkylling reviewed Sep 7, 2023

View reviewed changes

jkylling force-pushed the alluxio-cache branch from 659298d to d5118f6 Compare September 29, 2023 13:24

jkylling requested a review from raunaqmorarka September 29, 2023 14:12

jkylling force-pushed the alluxio-cache branch from d5118f6 to 5684fe2 Compare October 1, 2023 17:57

jkylling force-pushed the alluxio-cache branch from 5684fe2 to 8132290 Compare October 5, 2023 10:02

mosabua reviewed Jan 8, 2024

View reviewed changes

mosabua mentioned this pull request Jan 9, 2024

Add docs for filesystem caching #20300

Merged

hackeryang added the performance label Jan 10, 2024

raunaqmorarka reviewed Jan 31, 2024

View reviewed changes

...em-cache-alluxio/src/main/java/io/trino/filesystem/alluxio/AlluxioFileSystemCacheModule.java Outdated Show resolved Hide resolved

raunaqmorarka reviewed Jan 31, 2024

View reviewed changes

raunaqmorarka reviewed Feb 1, 2024

View reviewed changes

wendigo approved these changes Feb 1, 2024

View reviewed changes

jkylling and others added 2 commits February 1, 2024 19:20

Add NodeProvider interface, and use it in Delta Lake

3afd696

Co-authored-by: Florent Delannoy <florent.delannoy@gmail.com>

Add CacheFileSystem

88562de

Co-authored-by: Florent Delannoy <florent.delannoy@gmail.com>

raunaqmorarka approved these changes Feb 1, 2024

View reviewed changes

jkylling and others added 4 commits February 1, 2024 20:21

Add Alluxio cache

f6be87d

Co-authored-by: Florent Delannoy <florent.delannoy@gmail.com>

Add scheduleOnCoordinator option to TestingNodeManager

ce3433a

Add ConsistentHashingNodeProvider

5165069

Co-authored-by: Florent Delannoy <florent.delannoy@gmail.com>

Add Delta Lake Alluxio cache

97e8df7

Co-authored-by: Florent Delannoy <florent.delannoy@gmail.com>

electrum approved these changes Feb 2, 2024

View reviewed changes

mosabua mentioned this pull request Feb 2, 2024

Managed roll out of filesystem caching support #20550

Open

colebow mentioned this pull request Feb 14, 2024

Add Trino 439 release notes #20604

Merged

hashhar mentioned this pull request Feb 22, 2024

Rubix will not store cache files on coordinator, even it serves as worker #3563

Closed

Pluies mentioned this pull request Mar 18, 2024

Implement fs.cache.skip-paths #21131

Closed

okayhooni mentioned this pull request Jul 25, 2024

How about allowing the Alluxio cache to be shared between catalogs? (1 : N = cache : catalogs) #22815

Open


		import java.util.List;

		public class NoneCachingHostAddressProvider

Conversation

Pluies commented Aug 17, 2023 • edited by raunaqmorarka Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Additional context and related issues

Release notes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Pluies commented Aug 29, 2023

Uh oh!

rwilliams-r7 commented Aug 30, 2023

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Pluies commented Sep 20, 2023

Uh oh!

rwilliams-r7 commented Sep 21, 2023

Uh oh!

jkylling commented Sep 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

electrum commented Oct 4, 2023

Uh oh!

jkylling commented Oct 5, 2023

Uh oh!

wendigo commented Oct 5, 2023

Uh oh!

jkylling commented Oct 5, 2023

Uh oh!

wendigo commented Oct 5, 2023

Uh oh!

mosabua left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

amoghmargoor commented Jan 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mosabua commented Jan 16, 2024

Uh oh!

amoghmargoor commented Jan 17, 2024

Uh oh!

rwilliams-r7 commented Jan 18, 2024

Uh oh!

jkylling commented Jan 31, 2024

Uh oh!

raunaqmorarka commented Jan 31, 2024

Uh oh!

Uh oh!

raunaqmorarka Jan 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

raunaqmorarka left a comment

Choose a reason for hiding this comment

Uh oh!

raunaqmorarka Feb 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Pluies commented Aug 17, 2023 •

edited by raunaqmorarka

Loading

jkylling commented Sep 29, 2023 •

edited

Loading

amoghmargoor commented Jan 16, 2024 •

edited

Loading

raunaqmorarka Jan 31, 2024 •

edited

Loading

raunaqmorarka Feb 1, 2024 •

edited

Loading

raunaqmorarka commented Feb 1, 2024 •

edited

Loading

mosabua commented Feb 6, 2024 •

edited

Loading