ARXIVNG-419 refactored indexing agent in Python #154
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
It was kind of a drag to have to depend on the Java-based MultiLangDaemon provided by AWS to run Kinesis consumers, especially when there is such good support for Kinesis in Python (via boto3). It seemed like the right thing to do to implement a consumer entirely in Python, so that we have direct visibility onto the whole thing. It also gives us a lot more flexibility for how we implement the consumer, and frees us up from some pointless dependencies (MultiLangDaemon used DynamoDB for checkpointing, and CloudWatch for metrics, with little flexibility).
If we're happy with the
BaseConsumer
implementation, that's something that we could think about making part of arXiv base for future Kinesis integrations.I added the following to the README:
Running the indexing agent.
The indexing agent is responsible for updating the search index as new papers
are published. By default, docker-compose will also start the search index
and a service called Localstack
that provides a local Kinesis stream for testing/development purposes.
To disable the agent and localstack, just comment out those services in
docker-compose.yml
.The agent takes a little longer than the other services to start. Early in the
startup, you'll see something like:
A little while later, when localstack and the indexing agent are running, you
should see something like:
Note that Kinesis will be mounted locally on port 5586. It will be using SSL,
but with an invalid certificate. You can connect to this local Kinesis using:
To verify that the agent is working correctly, try adding some records to
the stream.
You should see these records being processed in the agent log output almost
immediately. For example: