Skip to content
This repository has been archived by the owner on Apr 3, 2024. It is now read-only.

Commit

Permalink
V7.17.1 (#3)
Browse files Browse the repository at this point in the history
Upgrade to ES 7.x and Python 3.7

- Removed doc_type
- Added ES auth system in tests
- Removed mappings level in schema.json
- Replace all deprecated body parameters
  • Loading branch information
ArnaudParant authored Mar 19, 2024
1 parent e4ae7d4 commit 6a6a2ce
Show file tree
Hide file tree
Showing 26 changed files with 162 additions and 167 deletions.
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM python:3.6
FROM python:3.7

ADD sel /sel
ADD setup.py .
Expand Down
3 changes: 3 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,9 @@ upshell: docker-test
docker-compose -f tests/docker-compose.yml -f tests/docker-compose.add_volumes.yml exec tests bash
docker-compose -f tests/docker-compose.yml down

down-tests:
docker-compose -f tests/docker-compose.yml down

install-sphinx:
pip install sphinx sphinx_rtd_theme myst-parser
pip install -e .
Expand Down
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ The project is split into two sub projects:


## Versions
Two first digits of SEL version match Elasticsearch version and then it's the inner SEL version, eg 6.8.1 works with ES 6.8, v1 of SEL for this version of ES
Two first digits of SEL version match Elasticsearch version and then it's the inner SEL version, eg 7.17.1 works with ES 7.17, v1 of SEL for this version of ES


## Full documentation
Expand All @@ -25,7 +25,7 @@ Be aware it will request ES schema at any query generation.

#### Add as dependency
```
sel @ git+https://github.com/ArnaudParant/sel.git@v6.8.1
sel @ git+https://github.com/ArnaudParant/sel.git@v7.17.1
```

### SEL as ES interface
Expand Down Expand Up @@ -66,6 +66,7 @@ See [SEL Server](https://github.com/ArnaudParant/sel_server) for API usage
- **lint** - Lint the code
- **tests** - Run all tests
- **upshell** - Up a shell into the docker, useful to run only few tests.
- **down-tests** - Down tests, in case of failed tests
- **install-sphinx** - Install Sphinx and dependencies to generate documentation.
- **doc** - Generate the documentation in `docs/build/html/`
- **clean** - Clean all `__pycache__`
Expand Down
2 changes: 1 addition & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
# -- Project information -----------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information

project = 'SEL 6.8.1'
project = 'SEL 7.17.1'
copyright = '2024, Arnaud Parant'
author = 'Arnaud Parant'

Expand Down
2 changes: 1 addition & 1 deletion docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
Welcome to Simple Elastic Language's documentation v6.8.1 !
Welcome to Simple Elastic Language's documentation v7.17.1 !
===========================================================

.. toctree::
Expand Down
12 changes: 6 additions & 6 deletions docs/source/query_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,8 +54,8 @@ Used for paging, it does not impact aggregations, such:

**Extended**

Allowed keys: `_source`, `fields`, `script_fields`, `fielddata_fields`, `explain`, `highlight`, `rescore`, `version`, `indices_boost`, `min_score`.
See [ES Search request body](https://www.elastic.co/guide/en/elasticsearch/reference/6.8/search-request-body.html)
Allowed keys: `_source`, `fields`, `script_fields`, `explain`, `version`, `indices_boost`, `min_score`.
See [ES Search request body](https://www.elastic.co/guide/en/elasticsearch/reference/7.17/search-search.html)

**Examples**

Expand Down Expand Up @@ -124,7 +124,7 @@ label.entity = 'bg:model'

#### Query String

Query string use [ES query string format](https://www.elastic.co/guide/en/elasticsearch/reference/6.8/query-dsl-query-string-query.html)
Query string use [ES query string format](https://www.elastic.co/guide/en/elasticsearch/reference/7.17/query-dsl-query-string-query.html)

```
label ~ "*pant*"
Expand All @@ -133,7 +133,7 @@ label ~ "*pant*"

#### Not match query string

Query string use [ES query string format](https://www.elastic.co/guide/en/elasticsearch/reference/6.8/query-dsl-query-string-query.html)
Query string use [ES query string format](https://www.elastic.co/guide/en/elasticsearch/reference/7.17/query-dsl-query-string-query.html)

```
label !~ "*pant*"
Expand Down Expand Up @@ -293,7 +293,7 @@ not 2018 <= date <= 2019

### Query string

Query string will match the `DefaultQueryStringFieldPath` (from `conf.ini`) with [ES query string format](https://www.elastic.co/guide/en/elasticsearch/reference/6.8/query-dsl-query-string-query.html).
Query string will match the `DefaultQueryStringFieldPath` (from `conf.ini`) with [ES query string format](https://www.elastic.co/guide/en/elasticsearch/reference/7.17/query-dsl-query-string-query.html).

```
"foam cage"
Expand Down Expand Up @@ -930,7 +930,7 @@ Advance interval exists, such as:
3h # 3 hours
```

See [Time units](https://www.elastic.co/guide/en/elasticsearch/reference/6.8/common-options.html#time-units)
See [Time units](https://www.elastic.co/guide/en/elasticsearch/reference/7.17/common-options.html#time-units)

### Sub aggregations

Expand Down
23 changes: 12 additions & 11 deletions scripts/elastic.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,11 +16,12 @@ def options():
parser.add_argument("index_name")
parser.add_argument("--overwrite", action="store_true")
parser.add_argument("--hosts", nargs='+')
parser.add_argument("--http-auth")
return parser.parse_args()


def create_index(filepath, schema_filepath, index, overwrite=False, hosts=None):
elastic = elastic_connect(hosts=hosts)
def create_index(filepath, schema_filepath, index, overwrite=False, hosts=None, http_auth=None):
elastic = elastic_connect(hosts=hosts, http_auth=http_auth)

with open(filepath) as fd:
data = loads_ndjson(fd)
Expand All @@ -43,12 +44,11 @@ def loads_ndjson(fd):
yield json.loads(line)


def _document_wrapper(index, documents, doc_type, id_getter, operation):
def _document_wrapper(index, documents, id_getter, operation):
for doc in documents:

wrapper = {"action": {operation: {
"_index": index,
"_type": doc_type,
"_id": id_getter(doc)
}}}

Expand Down Expand Up @@ -77,21 +77,21 @@ def _manager(elastic, documents, size, operation):
_sender(elastic, bulk, operation)


def bulk(elastic, index, doc_type, documents, id_getter, bulk_size=100, operation="index"):
docs = _document_wrapper(index, documents, doc_type, id_getter, operation)
def bulk(elastic, index, documents, id_getter, bulk_size=100, operation="index"):
docs = _document_wrapper(index, documents, id_getter, operation)
_manager(elastic, docs, bulk_size, operation)


def insert(elastic, index, data):
logging.info("Start insertion ...")
id_getter = lambda d: d["id"]
bulk(elastic, index, "document", data, id_getter)
bulk(elastic, index, data, id_getter)
logging.info("Done")


def _create_index(elastic, index, schema_filepath):
schema = load_schema(schema_filepath)
res = elastic.indices.create(index=index, body=schema, request_timeout=60)
res = elastic.indices.create(index=index, mappings=schema, request_timeout=60)
if "acknowledged" not in res:
logging.error("Index creation response:\n{res}")
raise Exception("Failed to create index: {index_name}")
Expand All @@ -102,11 +102,12 @@ def load_schema(filepath):
return json.load(fd)


def elastic_connect(hosts=None):
def elastic_connect(hosts=None, http_auth=None):
""" Create new elastic connection """
es_hosts = hosts if hosts else ["http://localhost:9200"]
es_hosts = hosts if hosts else ["http://localhost"]
kwargs = {
"hosts": _normalize_hosts(es_hosts),
"http_auth": http_auth,
"retry_on_timeout": True,
"timeout": 30
}
Expand All @@ -118,5 +119,5 @@ def elastic_connect(hosts=None):
args = options()
create_index(
args.filepath, args.schema_filepath, args.index_name,
overwrite=args.overwrite, hosts=args.hosts
overwrite=args.overwrite, hosts=args.hosts, http_auth=args.http_auth
)
59 changes: 0 additions & 59 deletions scripts/schema.json

This file was deleted.

14 changes: 1 addition & 13 deletions sel/meta.py
Original file line number Diff line number Diff line change
@@ -1,28 +1,16 @@
from datetime import datetime


def get_doc_type(mapping):
""" Get first found doc_type in mapping schema """
keys = list(mapping.keys())
if "settings" in keys:
del keys[keys.index("settings")]
if not keys:
return None
return keys[0]


def read_meta(mappings):
results = []

for index, schema in mappings.items():
doc_type = get_doc_type(schema["mappings"])
settings = schema["settings"]
creation_date_in_ms = float(settings["index"]["creation_date"]) / 1000

results.append({
"index": index,
"doc_type": doc_type,
"meta": schema["mappings"].get(doc_type, {}).get("_meta"),
"meta": schema["mappings"].get("_meta"),
"creation_date": datetime.fromtimestamp(creation_date_in_ms)
})

Expand Down
14 changes: 3 additions & 11 deletions sel/schema_reader.py
Original file line number Diff line number Diff line change
Expand Up @@ -169,14 +169,6 @@ def short_path(self, path, sub_properties=None):
return founds[0]["short_path"]


def schema_root(self, schema):
doc_type = meta.get_doc_type(schema)
if not doc_type:
raise InternalServerError(f"Corrupted index: {index}")

return schema[doc_type]["properties"]


def schema_object_matching(self, field, path, root):
"""
TODO: REFACTO
Expand Down Expand Up @@ -225,7 +217,7 @@ def schema_finder(self, query_field, root=None, path=[], nested=None):
## Route Managing
root_object = False
if root is None:
root = self.schema_root(self.schema)
root = self.schema["properties"]
elif isinstance(root, str):
root_object = True
root, field, path = self.__root_from_string(root, field)
Expand Down Expand Up @@ -310,7 +302,7 @@ def schema_walker(self, path, position=0, root=None, nested=None):
"""
key = path[position]
if root is None:
root = self.schema_root(self.schema)
root = self.schema["properties"]
if key in root:
elm = root[key]
if len(path) > 1:
Expand Down Expand Up @@ -339,7 +331,7 @@ def list_field(self, root=None, path=[], nested=None, sub_properties=None):
List all existing fields of the schema in details
"""
if root is None:
root = self.schema_root(self.schema)
root = self.schema["properties"]
elif isinstance(root, str):
root, _, _ = self.__root_from_string(root, None)

Expand Down
2 changes: 1 addition & 1 deletion sel/scroll.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ def _reader(data):
def scroll(elastic, index, query, cash_time, scroll_id=None):

if not scroll_id:
res = elastic.search(index=index, body=query, scroll=cash_time)
res = elastic.search(index=index, scroll=cash_time, **query)
else:
res = elastic.scroll(scroll_id=scroll_id, scroll=cash_time)

Expand Down
19 changes: 6 additions & 13 deletions sel/sel.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ def get_schema(self, index: str) -> dict:
.. code-block:: python
> sel.get_schema("foo")
{doc_type: {mapping ... }}
{mapping ... }
"""

Expand All @@ -95,7 +95,7 @@ def list_index(self, index: str = None) -> List[dict]:
> sel.list_index()
[
{'index': 'myindex', 'doc_type': 'document', 'meta': None, 'creation_date': datetime.datetime(2023, 9, 13, 13, 26, 42, 251000)},
{'index': 'myindex', 'meta': None, 'creation_date': datetime.datetime(2023, 9, 13, 13, 26, 42, 251000)},
...
]
Expand Down Expand Up @@ -431,13 +431,8 @@ def search(self, index: str, query: dict, no_deleted: bool = True) -> dict:
warns = query_obj["warns"]

self.logger.debug("es query = %s" % json.dumps(query_obj["elastic_query"]))
response = self.elastic.search(
index=index,
body=query_obj["elastic_query"],
doc_type=self.conf["Elasticsearch"]["DocType"],
_source=True
#analyze_wildcard=True # Does not exists in 5.x ?
)
response = self.elastic.search(index=index, _source=True, **query_obj["elastic_query"])
#analyze_wildcard=True # Does not exists since 5.x ?

results = self.PostFormater(warns, query_obj["query_data"], response)

Expand Down Expand Up @@ -675,12 +670,11 @@ def __delete_documents(

# Update documents in indexes
count = 0
doc_type = self.conf["Elasticsearch"]["DocType"]
id_getter = lambda d: d["id"]

for index_name, documents in index_documents.items():
count += len(documents)
upload.bulk(self.elastic, index_name, doc_type, documents, id_getter)
upload.bulk(self.elastic, index_name, documents, id_getter)

return {"action": action_id, "count": count}

Expand Down Expand Up @@ -757,11 +751,10 @@ def __really_delete_documents(self, index: str, query: dict) -> int:

# Update documents in indexes
count = 0
doc_type = self.conf["Elasticsearch"]["DocType"]
id_getter = lambda d: d["_id"]
for index_name, documents in index_documents.items():
count += len(documents)
upload.bulk(self.elastic, index_name, doc_type, documents, id_getter, operation="delete")
upload.bulk(self.elastic, index_name, documents, id_getter, operation="delete")

return count

Expand Down
Loading

0 comments on commit 6a6a2ce

Please sign in to comment.