-
-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/sql store #125
Merged
Merged
Feature/sql store #125
Changes from all commits
Commits
Show all changes
16 commits
Select commit
Hold shift + click to select a range
c4f7503
Fix leveldb store.pop()
simonwoerpel 33e5456
SQL statement table: more index
simonwoerpel 84600b6
SQL: mess around with ensure_tx contextmanager
simonwoerpel ea0c427
Implement SQLStore for statements
simonwoerpel cebd4d7
Add rests for sqlstor vs all the stores
simonwoerpel 20d5daa
Sql: Tweak statements query lookup
simonwoerpel 5d9dd5a
Tests: Ensure same number of entities after upsert
simonwoerpel f968f90
Revert ensure_tx function to original
simonwoerpel 2d0dd85
Make statement table dt fields nullable
simonwoerpel f2c60cf
Move pack_stmt function to serializers helper module
simonwoerpel 2389094
Make mypy happy with simplified transaction context handling
simonwoerpel 565dc10
pack_sql_statement: ensure iso dateformat
simonwoerpel 86f288d
Sql: fix resolver connected lookup
simonwoerpel 8ed0544
Sql: make sql iteration streamable or not
simonwoerpel c1dc320
Sql: order_by canonical_id and make canonical_id required in table
simonwoerpel 3d25cb4
Sql: drop unecesarry batch_size in bulk writer
simonwoerpel File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,173 @@ | ||
from typing import Any, Generator, List, Optional, Set, Tuple | ||
|
||
from followthemoney.property import Property | ||
from sqlalchemy import Table, create_engine, delete, select | ||
from sqlalchemy.sql.selectable import Select | ||
|
||
from nomenklatura.dataset import DS | ||
from nomenklatura.db import ( | ||
DB_URL, | ||
POOL_SIZE, | ||
get_metadata, | ||
get_statement_table, | ||
get_upsert_func, | ||
) | ||
from nomenklatura.entity import CE | ||
from nomenklatura.resolver import Resolver | ||
from nomenklatura.statement import Statement | ||
from nomenklatura.statement.serialize import pack_sql_statement | ||
from nomenklatura.store import Store, View, Writer | ||
|
||
|
||
class SqlStore(Store[DS, CE]): | ||
def __init__( | ||
self, | ||
dataset: DS, | ||
resolver: Resolver[CE], | ||
uri: str = DB_URL, | ||
**engine_kwargs: Any, | ||
): | ||
super().__init__(dataset, resolver) | ||
engine_kwargs["pool_size"] = engine_kwargs.pop("pool_size", POOL_SIZE) | ||
self.metadata = get_metadata() | ||
self.engine = create_engine(uri, **engine_kwargs) | ||
self.table = get_statement_table() | ||
self.metadata.create_all(self.engine, tables=[self.table], checkfirst=True) | ||
|
||
def writer(self) -> Writer[DS, CE]: | ||
return SqlWriter(self) | ||
|
||
def view(self, scope: DS, external: bool = False) -> View[DS, CE]: | ||
return SqlView(self, scope, external=external) | ||
|
||
def _execute(self, q: Select, many: bool = True) -> Generator[Any, None, None]: | ||
# execute any read query against sql backend | ||
with self.engine.connect() as conn: | ||
if many: | ||
conn = conn.execution_options(stream_results=True) | ||
cursor = conn.execute(q) | ||
while rows := cursor.fetchmany(10_000): | ||
yield from rows | ||
else: | ||
yield from conn.execute(q) | ||
|
||
def _iterate_stmts( | ||
self, q: Select, many: bool = True | ||
) -> Generator[Statement, None, None]: | ||
for row in self._execute(q, many=many): | ||
yield Statement.from_db_row(row) | ||
|
||
def _iterate(self, q: Select, many: bool = True) -> Generator[CE, None, None]: | ||
current_id = None | ||
current_stmts: list[Statement] = [] | ||
for stmt in self._iterate_stmts(q, many=many): | ||
entity_id = stmt.entity_id | ||
if current_id is None: | ||
current_id = entity_id | ||
if current_id != entity_id: | ||
proxy = self.assemble(current_stmts) | ||
if proxy is not None: | ||
yield proxy | ||
current_id = entity_id | ||
current_stmts = [] | ||
current_stmts.append(stmt) | ||
if len(current_stmts): | ||
proxy = self.assemble(current_stmts) | ||
if proxy is not None: | ||
yield proxy | ||
|
||
|
||
class SqlWriter(Writer[DS, CE]): | ||
BATCH_STATEMENTS = 10_000 | ||
|
||
def __init__(self, store: SqlStore[DS, CE]): | ||
self.store: SqlStore[DS, CE] = store | ||
self.batch: Optional[Set[Statement]] = None | ||
self.insert = get_upsert_func(self.store.engine) | ||
|
||
def flush(self) -> None: | ||
if self.batch: | ||
values = [pack_sql_statement(s) for s in self.batch] | ||
istmt = self.insert(self.store.table).values(values) | ||
stmt = istmt.on_conflict_do_update( | ||
index_elements=["id"], | ||
set_=dict( | ||
canonical_id=istmt.excluded.canonical_id, | ||
schema=istmt.excluded.schema, | ||
prop_type=istmt.excluded.prop_type, | ||
target=istmt.excluded.target, | ||
lang=istmt.excluded.lang, | ||
original_value=istmt.excluded.original_value, | ||
last_seen=istmt.excluded.last_seen, | ||
), | ||
) | ||
with self.store.engine.connect() as conn: | ||
conn.begin() | ||
conn.execute(stmt) | ||
conn.commit() | ||
self.batch = set() | ||
|
||
def add_statement(self, stmt: Statement) -> None: | ||
if self.batch is None: | ||
self.batch = set() | ||
if stmt.entity_id is None: | ||
return | ||
if len(self.batch) >= self.BATCH_STATEMENTS: | ||
self.flush() | ||
canonical_id = self.store.resolver.get_canonical(stmt.entity_id) | ||
stmt.canonical_id = canonical_id | ||
self.batch.add(stmt) | ||
|
||
def pop(self, entity_id: str) -> List[Statement]: | ||
self.flush() | ||
table = self.store.table | ||
q = select(table).where(table.c.entity_id == entity_id) | ||
q_delete = delete(table).where(table.c.entity_id == entity_id) | ||
statements: List[Statement] = list(self.store._iterate_stmts(q)) | ||
with self.store.engine.connect() as conn: | ||
conn.begin() | ||
conn.execute(q_delete) | ||
conn.commit() | ||
return statements | ||
|
||
|
||
class SqlView(View[DS, CE]): | ||
def __init__( | ||
self, store: SqlStore[DS, CE], scope: DS, external: bool = False | ||
) -> None: | ||
super().__init__(store, scope, external=external) | ||
self.store: SqlStore[DS, CE] = store | ||
|
||
def get_entity(self, id: str) -> Optional[CE]: | ||
table = self.store.table | ||
ids = [str(i) for i in self.store.resolver.connected(id)] | ||
q = select(table).where( | ||
table.c.entity_id.in_(ids), table.c.dataset.in_(self.dataset_names) | ||
) | ||
for proxy in self.store._iterate(q, many=False): | ||
return proxy | ||
return None | ||
|
||
def get_inverted(self, id: str) -> Generator[Tuple[Property, CE], None, None]: | ||
table = self.store.table | ||
q = ( | ||
select(table) | ||
.where(table.c.prop_type == "entity", table.c.value == id) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it needs to check There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
.distinct(table.c.value) | ||
) | ||
for stmt in self.store._iterate_stmts(q): | ||
if stmt.canonical_id is not None: | ||
entity = self.get_entity(stmt.canonical_id) | ||
if entity is not None: | ||
for prop, value in entity.itervalues(): | ||
if value == id and prop.reverse is not None: | ||
yield prop.reverse, entity | ||
|
||
def entities(self) -> Generator[CE, None, None]: | ||
table: Table = self.store.table | ||
q = ( | ||
select(table) | ||
.where(table.c.dataset.in_(self.dataset_names)) | ||
.order_by("canonical_id") | ||
) | ||
yield from self.store._iterate(q) |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, interesting, so the idea here is that we're not believing the
stmt.canonical_id
in the table? While that works, it's a a bit different from all the other store implementations we have now....There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, no, i guess i just misunderstood your comment here: #125 (comment)