Skip to content

File manager microservice to store files and their file-specific metadata

License

Notifications You must be signed in to change notification settings

mu-semtech/file-service

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

file-service

Microservice to upload and download files and store their file-specific metadata based on mu-ruby-template.

Tutorials

Add the file-service to a stack

Add the following snippet to your docker-compose.yml to include the file service in your project.

file:
  image: semtech/mu-file-service:3.4.0
  links:
    - database:database
  volumes:
    - ./data/files:/share

Start the service in your stack using docker-compose up -d file. The file service will be created.

Next, add rules to ./config/dispatcher/dispatcher.ex to dispatch all relevant requests starting with /files/ to the file service. E.g.

  define_accept_types [
    json: [ "application/vnd.api+json" ],
  ]

  ...

  get "/files/:id/download", %{ layer: :services } do
    Proxy.forward conn, [], "http://file/files/" <> id <> "/download"
  end

  post "/files/*path", %{ layer: :services } do
    Proxy.forward conn, path, "http://file/files/"
  end

  delete "/files/*path", %{ accept: [ :json ], layer: :services } do
    Proxy.forward conn, path, "http://file/files/"
  end

The host file in the forward URL reflects the name of the file service in the docker-compose.yml file.

Finally update the authorization configuration config/authorization/config.ex to make sure the user has appropriate read/write access on the resource type nfo:FileDataObject. E.g.

    ...
    constraint: %ResourceConstraint{
      resource_types: [
        "http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#FileDataObject",
        ...
      ]
    }
    ...

Restart the dispatcher and authorization database to pick up the new configuration

docker-compose restart dispatcher database

More information how to setup a mu.semte.ch project can be found in mu-project.

How-to guides

How to configure file resources in mu-cl-resources

If you want to model the files of the file service in the domain of your mu-cl-resources service, add the following snippet to your resource configuration.

If you use the Lisp configuration format add the following to your domain.lisp:

(define-resource file ()
  :class (s-prefix "nfo:FileDataObject")
  :properties `((:name :string ,(s-prefix "nfo:fileName"))
                (:format :string ,(s-prefix "dct:format"))
                (:size :number ,(s-prefix "nfo:fileSize"))
                (:extension :string ,(s-prefix "dbpedia:fileExtension"))
                (:created :datetime ,(s-prefix "dct:created")))
  :has-one `((file :via ,(s-prefix "nie:dataSource")
                   :inverse t
                   :as "download"))
  :resource-base (s-url "http://data.example.com/files/")
  :features `(include-uri)
  :on-path "files")

And configure these prefixes in your repository.lisp:

(add-prefix "nfo" "http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#")
(add-prefix "nie" "http://www.semanticdesktop.org/ontologies/2007/01/19/nie#")
(add-prefix "dct" "http://purl.org/dc/terms/")
(add-prefix "dbpedia" "http://dbpedia.org/ontology/")

If you use the JSON configuration format add the following to your domain.json:

{
  "version": "0.1",
  "prefixes": {
    "nfo": "http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#",
    "nie": "http://www.semanticdesktop.org/ontologies/2007/01/19/nie#",
    "dct": "http://purl.org/dc/terms/",
    "dbpedia": "http://dbpedia.org/resource/"
  },
  "resources": {
    "files": {
      "name": "file",
      "class": "nfo:FileDataObject",
      "attributes": {
        "name": {
          "type": "string",
          "predicate": "nfo:fileName"
        },
        "format": {
          "type": "string",
          "predicate": "dct:format"
        },
        "size": {
          "type": "integer",
          "predicate": "nfo:fileSize"
        },
        "extension": {
          "type": "string",
          "predicate": "dbpedia:fileExtension"
        },
        "created": {
          "type": "datetime",
          "predicate": "dct:created"
        }
      },
      "relationships": {
        "download": {
          "predicate": "nie:dataSource",
          "target": "file",
          "cardinality": "one",
          "inverse": true
        },
      },
      "new-resource-base": "http://data.example.com/files/",
      "features": ["include-uri"]
    }
  }
}

Next, add the following rule to ./config/dispatcher/dispatcher.ex. Make sure to add it somewhere after the rule forwarding /files/:id/download.

  define_accept_types [
    json: [ "application/vnd.api+json" ],
  ]

  ...

  get "/files/*path", %{ accept: [ :json ], layer: :services } do
    Proxy.forward conn, path, "http://resource/files/"
  end

Finally, restart the services to pick up the configuration changes:

docker-compose restart resource dispatcher

How to upload a file using a curl command

Assuming mu-dispatcher is running on localhost:80 a file upload can be executed using

curl -i -X POST -H "Content-Type: multipart/form-data" -F "file=@/a/file.somewhere" http://localhost/files

How to upgrade the file service from 2.x to 3.x

To upgrade the file service from 2.x to 3.x a migration must be executed in the form of a SPARQL query.

If you use mu-migrations-service add the following SPARQL query in a *.sparql file in your migrations folder. Else directly execute the SPARQL query against the datastore. Note: this will break support for file service 2.x!

PREFIX mu: <http://mu.semte.ch/vocabularies/core/>
PREFIX nfo: <http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#>
PREFIX nie: <http://www.semanticdesktop.org/ontologies/2007/01/19/nie#>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX dbpedia: <http://dbpedia.org/ontology/>

WITH <http://mu.semte.ch/application>
DELETE {
  ?uploadUri nfo:fileUrl ?fileUrl .
} INSERT {
  ?uploadUri dbpedia:fileExtension ?extension .
  ?fileUri a nfo:FileDataObject ;
    mu:uuid ?fileUuid ;
    nfo:fileName ?fileName ;
    dct:format ?format ;
    nfo:fileSize ?fileSize ;
    dbpedia:fileExtension ?extension ;
    dct:created ?created ;
    dct:modified ?modified ;
    nie:dataSource ?uploadUri .
} WHERE {
  ?uploadUri a nfo:FileDataObject ;
    nfo:fileName ?fileName ;
    dct:format ?format ;
    nfo:fileSize ?fileSize ;
    nfo:fileUrl ?fileUrl ;
    dct:created ?created ;
    dct:modified ?modified .

  OPTIONAL { ?fileUrl mu:uuid ?id }
  BIND(IF(BOUND(?id), ?id, STRUUID()) as ?fileUuid)

  BIND(IRI(REPLACE(STR(?fileUrl), "file:///data/", "share://")) as ?fileUri)

  BIND(STRAFTER(?fileName, ".") as ?extension)
}

Reference

Model

Data model

Ontologies and prefixes

The file service is mainly build around the Nepomuk File Ontology.

Prefix URI
nfo http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#
nie http://www.semanticdesktop.org/ontologies/2007/01/19/nie#
dct http://purl.org/dc/terms/
dbpedia http://dbpedia.org/ontology/

Files

Description

The file service represents an uploaded file as 2 resources in the triplestore: a resource reflecting the (virtual) uploaded file and another resource reflecting the (physical) resulting file stored on disk.

The URI of the stored file uses the share:// protocol and reflects the location where the file resides as a relative path to the share folder. E.g. share://uploads/my-file.pdf means the file is stored at /share/uploads/my-file.pdf.

Class

nfo:FileDataObject

Properties
Name Predicate Range Definition
name nfo:fileName xsd:string Name of the uploaded file
format dct:format xsd:string MIME-type of the file
size nfo:fileSize xsd:integer Size of the file in bytes
extension dbpedia:fileExtension xsd:string Extension of the file
created dct:created xsd:dateTime Upload datetime
dataSource nie:dataSource nfo:FileDataObject Uploaded file this file originates from (only set on stored file)

Configuration

Environment variables

The following settings can be configured via environment variables:

ENV Description default required
FILE_RESOURCE_BASE Base URI for a new upload-file resource. Must end with a trailing /. It will be concatenated with a uuid http://mu.semte.ch/services/file-service/files/
MU_APPLICATION_FILE_STORAGE_PATH Mounted subfolder where you want to store your files. It must be a relative path to /share/ in the Docker container None
VALIDATE_READABLE_METADATA Whether metadata of files must be readable on upload false
MU_SPARQL_ENDPOINT SPARQL read endpoint URL http://database:8890/sparql
MU_SPARQL_TIMEOUT Timeout (in seconds) for SPARQL queries 60
LOG_LEVEL The level of logging. Options: debug, info, warn, error, fatal info

File storage location

By default the file service stores the files in the root of the mounted volume /share/. You can configure the service to store the files in a mounted subfolder through the MU_APPLICATION_FILE_STORAGE_PATH environment variable. It must be a relative path to /share/ in the Docker container.

E.g.

file:
  image: semtech/mu-file-service:3.4.0
  links:
    - database:database
  environment:
    MU_APPLICATION_FILE_STORAGE_PATH: "my-project/uploads/"
  volumes:
    - ./data/my-project/uploads:/share/my-project/uploads

The subfolder will be taken into account when generating the file URI. A URI for a file stored using the file service configured above will look like share://my-project/uploads/example.pdf.

Database connection

The triple store used in the backend is linked to the file service container as database. If you configure another SPARQL endpoint URL through MU_SPARQL_ENDPOINT update the link name accordingly. Make sure the file service is able to execute update queries against this store.

REST API

POST /files

Upload a file. Accepts a multipart/form-data with a file parameter containing the uploaded file.

Response
201 Created

On successful upload with the newly created file in the response body:

{
  "links": {
    "self": "files/b178ba66-206e-4551-b41e-4a46983912c0"
  },
  "data": {
    "type": "files",
    "id": "b178ba66-206e-4551-b41e-4a46983912c0",
    "attributes": {
        "name": "upload-name.pdf",
        "format": "application/pdf",
        "size": 1930
        "extension": "pdf"
    }
  }
}
400 Bad Request
  • if file param is missing.

GET /files/:id

Get metadata of the file with the given id.

Response
200 OK

Returns the metadata of the file with the given id.

{
  "links": {
    "self": "files/b178ba66-206e-4551-b41e-4a46983912c0"
  },
  "data": {
    "type": "files",
    "id": "b178ba66-206e-4551-b41e-4a46983912c0",
    "attributes": {
        "name": "upload-name.pdf",
        "format": "application/pdf",
        "size": 1930
        "extension": "pdf"
    }
  }
}
404 Bad Request

If a file with the given id cannot be found.

GET /files/:id/download

Download the content of the file with the given id.

Query paramaters
  • name (optional): name for the downloaded file (e.g. /files/1/download?name=report.pdf)
  • content-disposition (optional): specify with which Content-Disposition header-value you want the service to respond. Defaults to attachment.
Response
200 Ok

Expected response, the file is returned.

404 Bad Request

No file could be found with the given id.

500 Server error

A file with the given id could be found in the database but not on disk. This is most likely due to configuration issue on the server.

DELETE /files/:id

Delete the file (metadata and content) with the given id.

Response
204 No Content

On successful delete.