-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible future development: towards a centralized storage infrastructure ? #262
Comments
Towards a web-oriented, git-less databaseHere is a description of the long term goals. ChildProject should be able to interact with corpora using different storage supports:
For instance, the third option would apply to users of a centralized database. The centralized database (let's call it daylong-db) would come with include two packages:
It should be possible to run processing pipelines remotely too. The API would return a handler which could be used to check for the status of the job at anytime and to retrieve the results. On the server side, the jobs could be run using slurm on a local infrastructure or on a cloud computing provider such as AWS. Note that it should be possible to convert corpora from any storage format to any other (e.g. export/import from CSV to DB etc.) Roadmap
ImplementationStoresA store is an object that can fetch/updates data from a given storage (local or remote, CSV vs SQL, etc.) We'd have:
Which would all inherit from a Store abstract class, e.g.: class Store(ABC):
def __init__(self):
pass
@abstractmethod
def get_children(self):
pass
@abstractmethod
def get_recordings(self):
pass
@abstractmethod
def get_annotations(self, sets: Optional[List[str]] = None):
pass
@abstractmethod
def add_child(self, child: dict):
pass
@abstractmethod
def update_child(self, child: dict):
pass
@abstractmethod
def delete_child(self, child: str):
pass
@abstractmethod
def add_recording(self, recording: dict):
pass
@abstractmethod
def update_recording(self, recording: dict):
pass
@abstractmethod
def delete_recording(self, recording: str):
pass
@abstractmethod
def add_annotations(self, annotations: pd.DataFrame):
pass
@abstractmethod
def update_annotation(self):
pass
@abstractmethod
def delete_annotations(self, annotations: pd.DataFrame):
pass
class CSVStore(Store):
def __init__(self, path):
super().__init__()
self.path = path
def get_children(self):
children = pd.read_csv(join(self.path, 'metadata/children.csv'))
return children
# etc.
class SQLStore(Store):
def __init__(self, engine: Engine, corpus: str):
super().__init__()
self.engine = engine
self.conn = engine.connect()
self.corpus = corpus
def get_children(self):
children = pd.read_sql(query, self.conn)
return children
# etc.
Pros
Cons
|
(This is WIP)
Is your feature request related to a problem? Please describe.
The current design in the one described in Managing, storing, and sharing long-form recordings and their annotations.
It can be summed up this way:
There are a few issues that remain unsolved by this design:
Although there are advantages to decentralization, these limitations call for (at least one) centralized database of daylong recordings.
I'll discuss two alternatives:
The text was updated successfully, but these errors were encountered: