scalableminds · MichaelBuessemeyer · Jul 22, 2024 · Jun 21, 2024 · Jun 24, 2024 · Jun 24, 2024
diff --git a/CHANGELOG.unreleased.md b/CHANGELOG.unreleased.md
@@ -14,12 +14,13 @@ For upgrade instructions, please check the [migration guide](MIGRATIONS.released
 - Added that proofreading merge actions reuse custom names of segments. A merge action now combines the potenial existing custom names of both segments and a split-action copies the custom name to the split-off segment. [#7877](https://github.com/scalableminds/webknossos/pull/7877)
 - Added the option for the owner to lock explorative annotations. Locked annotations cannot be modified by any user. An annotation can be locked in the annotations table and when viewing the annotation via the navbar dropdown menu. [#7801](https://github.com/scalableminds/webknossos/pull/7801)
 - Added the option to set a default mapping for a dataset in the dataset view configuration. The default mapping is loaded when the dataset is opened and the user / url does not configure something else. [#7858](https://github.com/scalableminds/webknossos/pull/7858)
+- WEBKNOSSOS now automatically searches in subfolder / sub-collection identifiers for valid datasets in case a provided link to a remote dataset does not directly point to a dataset. [#7912](https://github.com/scalableminds/webknossos/pull/7912)
 - Uploading an annotation into a dataset that it was not created for now also works if the dataset is in a different organization. [#7816](https://github.com/scalableminds/webknossos/pull/7816)
 - When downloading + reuploading an annotation that is based on a segmentation layer with active mapping, that mapping is now still be selected after the reupload. [#7822](https://github.com/scalableminds/webknossos/pull/7822)
 - Added the ability to change the unit of the dataset voxel size to any supported unit of the [ome/ngff standard](https://github.com/ome/ngff/blob/39605eec64ceff481bb3a98f0adeaa330ab1ef26/latest/index.bs#L192). This allows users to upload and work with low-resolution datasets with a different base unit than nanometer. [#7783](https://github.com/scalableminds/webknossos/pull/7783)
 - In the Voxelytics workflow list, the name of the WEBKNOSSOS user who started the job is displayed. [#7794](https://github.com/scalableminds/webknossos/pull/7795)
 - Start an alignment job (aligns the section in a dataset) via the "AI Analysis" button. [#7820](https://github.com/scalableminds/webknossos/pull/7820)
-- Added additional validation for the animation job modal. Bounding boxes must be larger then zero. [#7883](https://github.com/scalableminds/webknossos/pull/7883)
+- Added additional validation for the animation job modal. Bounding boxes must be larger than zero. [#7883](https://github.com/scalableminds/webknossos/pull/7883)
 
 ### Changed
 - The "WEBKNOSSOS Changelog" modal now lazily loads its content potentially speeding up the initial loading time of WEBKNOSSOS and thus improving the UX. [#7843](https://github.com/scalableminds/webknossos/pull/7843)

diff --git a/docs/datasets.md b/docs/datasets.md
@@ -47,7 +47,7 @@ In particular, the following file formats are supported for uploading (and conve
 Once the data is uploaded (and potentially converted), you can further configure a dataset's [Settings](#configuring-datasets) and double-check layer properties, fine tune access rights & permissions, or set default values for rendering.
 
 ### Streaming from remote servers and the cloud
-WEBKNOSSOS supports loading and remotely streaming [Zarr](https://zarr.dev), [Neuroglancer precomputed format](https://github.com/google/neuroglancer/tree/master/src/neuroglancer/datasource/precomputed) and [N5](https://github.com/saalfeldlab/n5) datasets from a remote source, e.g. Cloud storage (S3) or HTTP server. 
+WEBKNOSSOS supports loading and remotely streaming [Zarr](https://zarr.dev), [Neuroglancer precomputed format](https://github.com/google/neuroglancer/tree/master/src/neuroglancer/datasource/precomputed) and [N5](https://github.com/saalfeldlab/n5) datasets from a remote source, e.g. Cloud storage (S3 / GCS) or HTTP server. 
 WEBKNOSSOS supports loading Zarr datasets according to the [OME NGFF v0.4 spec](https://ngff.openmicroscopy.org/latest/).
 
 WEBKNOSSOS can load several remote sources and assemble them into a WEBKNOSSOS dataset with several layers, e.g. one Zarr file/source for the `color` layer and one Zarr file/source for a `segmentation` layer. 
@@ -57,7 +57,7 @@ With other converters, you may need to add the layers separately.
 1. From the *Datasets* tab in the user dashboard, click the *Add Dataset* button.
 2. Select the *Add Remote Dataset* tab
 3. For each layer, provide some metadata information:  
-    - a URL or domain/collection identifier to locate the dataset on the remote service (supported protocols are HTTPS, Amazon S3 and Google Cloud Storage).
+    - a URL or domain/collection identifier to locate the dataset on the remote service (supported protocols are HTTPS, Amazon S3 and Google Cloud Storage). In case the URL or domain/collection identifier do not directly point to a dataset, WEBKNOSSOS will try to locate any dataset in subfolders / sub-collection identifiers.
     - authentication credentials for accessing the resources on the remote service (optional)
 4. Click the *Add Layer* button
 5. WEBKNOSSOS will automatically try to infer as many dataset properties (voxel size, bounding box, etc.) as possible and preview a [WEBKNOSSOS `datasource` configuration](./data_formats.md#dataset-metadata-specification) for your to review. 

diff --git a/frontend/javascripts/admin/dataset/dataset_add_remote_view.tsx b/frontend/javascripts/admin/dataset/dataset_add_remote_view.tsx
@@ -381,7 +381,7 @@ function AddRemoteLayer({
   const [showCredentialsFields, setShowCredentialsFields] = useState<boolean>(false);
   const [usernameOrAccessKey, setUsernameOrAccessKey] = useState<string>("");
   const [passwordOrSecretKey, setPasswordOrSecretKey] = useState<string>("");
-  const [selectedProtocol, setSelectedProtocol] = useState<"s3" | "https" | "gs">("https");
+  const [selectedProtocol, setSelectedProtocol] = useState<"s3" | "https" | "gs" | "file">("https");
   const [fileList, setFileList] = useState<FileList>([]);
 
   const handleChange = (info: UploadChangeParam<UploadFile<any>>) => {
@@ -394,12 +394,14 @@ function AddRemoteLayer({
     if (userInput.startsWith("https://") || userInput.startsWith("http://")) {
       setSelectedProtocol("https");
     } else if (userInput.startsWith("s3://")) {
-      setSelectedProtocol("s3");
+      setSelectedProtocol("s3"); // Unused
     } else if (userInput.startsWith("gs://")) {
       setSelectedProtocol("gs");
+    } else if (userInput.startsWith("file://")) {
+      setSelectedProtocol("file"); // Unused
     } else {
       throw new Error(
-        "Dataset URL must employ one of the following protocols: https://, http://, s3:// or gs://",
+        "Dataset URL must employ one of the following protocols: https://, http://, s3://, gs:// or file://",
       );
     }
   }

diff --git a/package.json b/package.json
@@ -76,6 +76,10 @@
   "scripts": {
     "start": "node tools/proxy/proxy.js",
     "build": "node --max-old-space-size=4096 node_modules/.bin/webpack --env production",
+    "build-backend": "yarn build-wk-backend && yarn build-wk-datastore && yarn build-wk-tracingstore",
+    "build-wk-backend": "sbt -no-colors -DfailOnWarning compile stage",
+    "build-wk-datastore": "sbt -no-colors -DfailOnWarning \"project webknossosDatastore\" copyMessages compile stage",
+    "build-wk-tracingstore": "sbt -no-colors -DfailOnWarning \"project webknossosTracingstore\" copyMessages compile stage",
     "build-dev": "node_modules/.bin/webpack",
     "build-watch": "node_modules/.bin/webpack -w",
     "listening": "lsof -i:5005,7155,9000,9001,9002",

diff --git a/project/Dependencies.scala b/project/Dependencies.scala
@@ -56,7 +56,7 @@ object Dependencies {
     // MultiArray (ndarray) handles. import ucar
     "edu.ucar" % "cdm-core" % "5.4.2",
     // Amazon S3 cloud storage client. import com.amazonaws
-    "com.amazonaws" % "aws-java-sdk-s3" % "1.12.584",
+    "com.amazonaws" % "aws-java-sdk-s3" % "1.12.584", // TODO Update?!!
     // Google cloud storage client. import com.google.cloud.storage, import com.google.auth.oauth2
     "com.google.cloud" % "google-cloud-storage" % "2.36.1",
     // Blosc compression. import org.blosc

diff --git a/test/backend/DataVaultTestSuite.scala b/test/backend/DataVaultTestSuite.scala
@@ -134,6 +134,8 @@ class DataVaultTestSuite extends PlaySpec {
       class MockDataVault extends DataVault {
         override def readBytesAndEncoding(path: VaultPath, range: RangeSpecifier)(
             implicit ec: ExecutionContext): Fox[(Array[Byte], Encoding.Value)] = ???
+
+        override def listDirectory(path: VaultPath)(implicit ec: ExecutionContext): Fox[List[VaultPath]] = ???
       }
 
       "Uri has no trailing slash" should {

diff --git a/...tastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala b/...tastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala
@@ -34,10 +34,11 @@ import play.api.libs.json.Json
 import play.api.mvc.{Action, AnyContent, MultipartFormData, PlayBodyParsers}
 
 import java.io.File
-import com.scalableminds.webknossos.datastore.storage.AgglomerateFileKey
+import com.scalableminds.webknossos.datastore.storage.{AgglomerateFileKey, DataVaultService}
 import net.liftweb.common.{Box, Empty, Failure, Full}
 import play.api.libs.Files
 
+import java.net.URI
 import scala.collection.mutable.ListBuffer
 import scala.concurrent.{ExecutionContext, Future}
 import scala.concurrent.duration._
@@ -721,10 +722,14 @@ class DataSourceController @Inject()(
     Action.async(validateJson[ExploreRemoteDatasetRequest]) { implicit request =>
       accessTokenService.validateAccess(UserAccessRequest.administrateDataSources(request.body.organizationName), token) {
         val reportMutable = ListBuffer[String]()
+        val hasLocalFilesystemRequest = request.body.layerParameters.exists(param =>
+          new URI(param.remoteUri).getScheme == DataVaultService.schemeFile)
         for {
           dataSourceBox: Box[GenericDataSource[DataLayer]] <- exploreRemoteLayerService
             .exploreRemoteDatasource(request.body.layerParameters, reportMutable)
             .futureBox
+          // Remove report of recursive exploration in case of exploring the local file system to avoid information exposure.
+          _ <- Fox.runIf(hasLocalFilesystemRequest)(Fox.successful(reportMutable.clear()))
           dataSourceOpt = dataSourceBox match {
             case Full(dataSource) if dataSource.dataLayers.nonEmpty =>
               reportMutable += s"Resulted in dataSource with ${dataSource.dataLayers.length} layers."

diff --git a/webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/DataVault.scala b/webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/DataVault.scala
@@ -5,6 +5,9 @@ import com.scalableminds.util.tools.Fox
 import scala.concurrent.ExecutionContext
 
 trait DataVault {
+  val MAX_EXPLORED_ITEMS_PER_LEVEL = 10
   def readBytesAndEncoding(path: VaultPath, range: RangeSpecifier)(
       implicit ec: ExecutionContext): Fox[(Array[Byte], Encoding.Value)]
+
+  def listDirectory(path: VaultPath)(implicit ec: ExecutionContext): Fox[List[VaultPath]]
 }
diff --git a/...-datastore/app/com/scalableminds/webknossos/datastore/datavault/FileSystemDataVault.scala b/...-datastore/app/com/scalableminds/webknossos/datastore/datavault/FileSystemDataVault.scala
@@ -8,21 +8,28 @@ import org.apache.commons.lang3.builder.HashCodeBuilder
 
 import java.nio.ByteBuffer
 import java.nio.file.{Files, Path, Paths}
+import java.util.stream.Collectors
 import scala.concurrent.ExecutionContext
+import scala.jdk.CollectionConverters._
 
 class FileSystemDataVault extends DataVault {
 
-  override def readBytesAndEncoding(path: VaultPath, range: RangeSpecifier)(
-      implicit ec: ExecutionContext): Fox[(Array[Byte], Encoding.Value)] = {
+  private def vaultPathToLocalPath(path: VaultPath)(implicit ec: ExecutionContext): Fox[Path] = {
     val uri = path.toUri
     for {
       _ <- bool2Fox(uri.getScheme == DataVaultService.schemeFile) ?~> "trying to read from FileSystemDataVault, but uri scheme is not file"
       _ <- bool2Fox(uri.getHost == null || uri.getHost.isEmpty) ?~> s"trying to read from FileSystemDataVault, but hostname ${uri.getHost} is non-empty"
       localPath = Paths.get(uri.getPath)
       _ <- bool2Fox(localPath.isAbsolute) ?~> "trying to read from FileSystemDataVault, but hostname is non-empty"
+    } yield localPath
+  }
+
+  override def readBytesAndEncoding(path: VaultPath, range: RangeSpecifier)(
+      implicit ec: ExecutionContext): Fox[(Array[Byte], Encoding.Value)] =
+    for {
+      localPath <- vaultPathToLocalPath(path)
       bytes <- readBytesLocal(localPath, range)
     } yield (bytes, Encoding.identity)
-  }
 
   private def readBytesLocal(localPath: Path, range: RangeSpecifier)(implicit ec: ExecutionContext): Fox[Array[Byte]] =
     if (Files.exists(localPath)) {
@@ -53,6 +60,19 @@ class FileSystemDataVault extends DataVault {
       }
     } else Fox.empty
 
+  override def listDirectory(path: VaultPath)(implicit ec: ExecutionContext): Fox[List[VaultPath]] =
+    vaultPathToLocalPath(path).map(localPath => {
+      if (!Files.isDirectory(localPath)) return Fox.successful(List.empty)
+      Files
+        .list(localPath)
+        .filter(file => Files.isDirectory(file))
+        .collect(Collectors.toList())
+        .asScala
+        .toList
+        .map(dir => new VaultPath(dir.toUri, this))
+        .take(MAX_EXPLORED_ITEMS_PER_LEVEL)
+    })
+
   override def hashCode(): Int =
     new HashCodeBuilder(19, 31).toHashCode
 

diff --git a/...datastore/app/com/scalableminds/webknossos/datastore/datavault/GoogleCloudDataVault.scala b/...datastore/app/com/scalableminds/webknossos/datastore/datavault/GoogleCloudDataVault.scala
@@ -11,6 +11,7 @@ import java.io.ByteArrayInputStream
 import java.net.URI
 import java.nio.ByteBuffer
 import scala.concurrent.ExecutionContext
+import scala.jdk.CollectionConverters.IterableHasAsScala
 
 class GoogleCloudDataVault(uri: URI, credential: Option[GoogleServiceAccountCredential]) extends DataVault {
 
@@ -72,6 +73,18 @@ class GoogleCloudDataVault(uri: URI, credential: Option[GoogleServiceAccountCred
     } yield (bytes, encoding)
   }
 
+  override def listDirectory(path: VaultPath)(implicit ec: ExecutionContext): Fox[List[VaultPath]] = {
+    val objName = path.toUri.getPath.tail
+    val blobs = storage.list(bucket,
+                             Storage.BlobListOption.prefix(objName),
+                             Storage.BlobListOption.currentDirectory(),
+                             Storage.BlobListOption.pageSize(MAX_EXPLORED_ITEMS_PER_LEVEL))
+    val subDirectories = blobs.getValues.asScala.toList.filter(_.isDirectory)
+    val paths = subDirectories.map(dirBlob =>
+      new VaultPath(new URI(s"${uri.getScheme}://$bucket/${dirBlob.getBlobId.getName}"), this))
+    Fox.successful(paths)
+  }
+
   private def getUri = uri
   private def getCredential = credential
 

diff --git a/...ossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/HttpsDataVault.scala b/...ossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/HttpsDataVault.scala
@@ -44,6 +44,9 @@ class HttpsDataVault(credential: Option[DataVaultCredential], ws: WSClient) exte
 
   }
 
+  override def listDirectory(path: VaultPath)(implicit ec: ExecutionContext): Fox[List[VaultPath]] =
+    Fox.successful(List.empty)
+
   private val headerInfoCache: AlfuCache[URI, (Boolean, Long)] = AlfuCache()
 
   private def getHeaderInformation(uri: URI)(implicit ec: ExecutionContext): Fox[(Boolean, Long)] =

diff --git a/webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala b/webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala
@@ -11,7 +11,7 @@ import com.amazonaws.auth.{
 import com.amazonaws.client.builder.AwsClientBuilder.EndpointConfiguration
 import com.amazonaws.regions.Regions
 import com.amazonaws.services.s3.{AmazonS3, AmazonS3ClientBuilder}
-import com.amazonaws.services.s3.model.{GetObjectRequest, S3Object}
+import com.amazonaws.services.s3.model.{GetObjectRequest, ListObjectsV2Request, S3Object}
 import com.amazonaws.util.AwsHostNameUtils
 import com.scalableminds.util.tools.Fox
 import com.scalableminds.webknossos.datastore.storage.{
@@ -26,6 +26,7 @@ import org.apache.commons.lang3.builder.HashCodeBuilder
 import java.net.URI
 import scala.collection.immutable.NumericRange
 import scala.concurrent.ExecutionContext
+import scala.jdk.CollectionConverters._
 
 class S3DataVault(s3AccessKeyCredential: Option[S3AccessKeyCredential], uri: URI) extends DataVault {
   private lazy val bucketName = S3DataVault.hostBucketFromUri(uri) match {
@@ -50,7 +51,8 @@ class S3DataVault(s3AccessKeyCredential: Option[S3AccessKeyCredential], uri: URI
 
   private def getRequest(bucketName: String, key: String): GetObjectRequest = new GetObjectRequest(bucketName, key)
 
-  private def performRequest(request: GetObjectRequest)(implicit ec: ExecutionContext): Fox[(Array[Byte], String)] = {
+  private def performGetObjectRequest(request: GetObjectRequest)(
+      implicit ec: ExecutionContext): Fox[(Array[Byte], String)] = {
     var s3objectRef: Option[S3Object] = None // Used for cleanup later (possession of a S3Object requires closing it)
     try {
       val s3object = client.getObject(request)
@@ -73,6 +75,26 @@ class S3DataVault(s3AccessKeyCredential: Option[S3AccessKeyCredential], uri: URI
     }
   }
 
+  private def performGetObjectSummariesRequest(bucketName: String, keyPrefix: String)(
+      implicit ec: ExecutionContext): Fox[List[String]] =
+    try {
+      val listObjectsRequest = new ListObjectsV2Request
+      listObjectsRequest.setBucketName(bucketName)
+      listObjectsRequest.setPrefix(keyPrefix)
+      listObjectsRequest.setDelimiter("/")
+      listObjectsRequest.setMaxKeys(MAX_EXPLORED_ITEMS_PER_LEVEL)
+      val objectListing = client.listObjectsV2(listObjectsRequest)
+      val s3SubPrefixes = objectListing.getCommonPrefixes.asScala.toList
+      Fox.successful(s3SubPrefixes)
+    } catch {
+      case e: AmazonServiceException =>
+        e.getStatusCode match {
+          case 404 => Fox.empty
+          case _   => Fox.failure(e.getMessage)
+        }
+      case e: Exception => Fox.failure(e.getMessage)
+    }
+
   override def readBytesAndEncoding(path: VaultPath, range: RangeSpecifier)(
       implicit ec: ExecutionContext): Fox[(Array[Byte], Encoding.Value)] =
     for {
@@ -82,10 +104,18 @@ class S3DataVault(s3AccessKeyCredential: Option[S3AccessKeyCredential], uri: URI
         case SuffixLength(l) => getSuffixRangeRequest(bucketName, objectKey, l)
         case Complete()      => getRequest(bucketName, objectKey)
       }
-      (bytes, encodingString) <- performRequest(request)
+      (bytes, encodingString) <- performGetObjectRequest(request)
       encoding <- Encoding.fromRfc7231String(encodingString)
     } yield (bytes, encoding)
 
+  override def listDirectory(path: VaultPath)(implicit ec: ExecutionContext): Fox[List[VaultPath]] =
+    for {
+      prefixKey <- Fox.box2Fox(S3DataVault.objectKeyFromUri(path.toUri))
+      s3SubPrefixKeys <- performGetObjectSummariesRequest(bucketName, prefixKey)
+      vaultPaths <- Fox.successful(
+        s3SubPrefixKeys.map(key => new VaultPath(new URI(s"${uri.getScheme}://$bucketName/$key"), this)))
+    } yield vaultPaths
+
   private def getUri = uri
   private def getCredential = s3AccessKeyCredential
 

diff --git a/webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/VaultPath.scala b/webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/VaultPath.scala
@@ -38,6 +38,8 @@ class VaultPath(uri: URI, dataVault: DataVault) extends LazyLogging {
         }
     }
 
+  def listDirectory()(implicit ec: ExecutionContext): Fox[List[VaultPath]] = dataVault.listDirectory(this)
+
   private def decodeBrotli(bytes: Array[Byte]) = {
     Brotli4jLoader.ensureAvailability()
     val brotliInputStream = new BrotliInputStream(new ByteArrayInputStream(bytes))