Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TiKV backend and Kafka index source #69

Merged
merged 38 commits into from
Dec 8, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
16cbfb9
Add TiKV as backend
maeb May 7, 2022
6f1c7e7
Promote index, server, loader and surt packages from internal
maeb Nov 4, 2022
2597972
Fix flag descriptions
maeb Nov 4, 2022
8a59e80
Fix config example
maeb Nov 4, 2022
58effc1
Don't log before logger is initialized
maeb Nov 4, 2022
1842610
Fix flags
maeb Nov 4, 2022
551cb85
Upgrade tikv go client
maeb Nov 4, 2022
b5387ff
Be explicit about it context
maeb Nov 4, 2022
3964928
Remove unused function mergeSort
maeb Nov 5, 2022
97a2e79
Promote timestamp package from internal
maeb Nov 5, 2022
d51b84a
Build docker image with golang 1.19
maeb Nov 5, 2022
dc01fc9
Build and push image on PR to main branch
maeb Nov 5, 2022
dc3d4ac
Update dependencies
maeb Nov 5, 2022
81324ac
Simplify worker scheduling
maeb Nov 5, 2022
001402b
Do not allow search with no key
maeb Nov 5, 2022
c131894
Updated go.*
maeb Nov 5, 2022
a1d6c79
Linter
maeb Nov 5, 2022
9b85cee
Update distroless base image tag
maeb Nov 5, 2022
e56c735
Indexer log warning on errors
maeb Nov 5, 2022
b01a353
Add concept of database to badger and tikv
maeb Nov 6, 2022
0d4d845
Add database concept to badger index
maeb Nov 6, 2022
99885a2
Differentiate between already indexed errors and other errors
maeb Nov 6, 2022
5efd9d6
Update GitHub actions
maeb Nov 6, 2022
4f735fa
Be lenient when sort=closest and closest=""
maeb Nov 6, 2022
02fc725
tikv: Don't use invalid iterators
maeb Nov 7, 2022
9341acf
tikv: Fix iter prefix check
maeb Nov 7, 2022
9014d7b
tikv: Fix closest iterator
maeb Nov 7, 2022
f124ee3
tikv: snapshot for reads, retry failed commits and fix reverse sort
maeb Nov 7, 2022
a5470ce
tikv: Remove default limit
maeb Nov 9, 2022
97bbbca
Add kafka index source
maeb Nov 29, 2022
f77ad37
Refactor and consolidate
maeb Nov 29, 2022
c75e613
Update protoc to version 21.10
maeb Dec 1, 2022
1e3b3d1
Promote logger package out of internal
maeb Dec 1, 2022
b48b83b
Close WarcFileReader independent of context
maeb Dec 1, 2022
b58a4ba
Close revisit record before merge
maeb Dec 1, 2022
419f8a5
Consolidate error handling in handlers
maeb Dec 1, 2022
6ef0747
Update dependencies
maeb Dec 1, 2022
9e7a577
Update README and example config
maeb Dec 1, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 7 additions & 4 deletions .github/workflows/release.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ name: release
on:
push:
branches:
- master
- main
tags:
- v*

Expand All @@ -14,11 +14,14 @@ env:
jobs:
build:
runs-on: ubuntu-latest
permissions:
contents: read
packages: write

steps:
- name: Extract metadata (tags, labels) for Docker
id: meta
uses: docker/metadata-action@v3
uses: docker/metadata-action@v4
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
tags: |
Expand All @@ -27,14 +30,14 @@ jobs:
type=ref,event=pr

- name: Log in to the container registry
uses: docker/login-action@v1
uses: docker/login-action@v2
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}

- name: Build and push Docker image
uses: docker/build-push-action@v2
uses: docker/build-push-action@v3
with:
push: true
build-args: |
Expand Down
39 changes: 33 additions & 6 deletions .github/workflows/test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,21 +6,25 @@ on:
permissions:
contents: read

env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}

jobs:
unit_test:
name: Golang unit tests
runs-on: ubuntu-latest

steps:
- name: Checkout repository
uses: actions/checkout@v2
uses: actions/checkout@v3

- uses: actions/setup-go@v2
- uses: actions/setup-go@v3
with:
go-version: '^1.16'
go-version: '^1.19'

- name: Cache go modules
uses: actions/cache@v2
uses: actions/cache@v3
with:
path: |
~/.cache/go-build
Expand All @@ -35,9 +39,32 @@ jobs:
name: Linting
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: golangci/golangci-lint-action@v2
- uses: actions/checkout@v3
- uses: actions/setup-go@v3
with:
go-version: '^1.19'
- uses: golangci/golangci-lint-action@v3
with:
version: latest
# Enable additional linters (see: https://golangci-lint.run/usage/linters/)
args: -E "bodyclose" -E "dogsled" -E "durationcheck" -E "errorlint" -E "forcetypeassert" -E "noctx" -E "exhaustive" -E "exportloopref" --timeout 3m0s
build:
runs-on: ubuntu-latest
steps:
- name: Extract metadata (tags, labels) for Docker
id: meta
uses: docker/metadata-action@v4
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
tags: |
type=semver,pattern={{version}}
type=ref,event=branch
type=ref,event=pr

- name: Build and push Docker image
uses: docker/build-push-action@v3
with:
build-args: |
VERSION=${{ steps.meta.outputs.version }}
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
4 changes: 2 additions & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM golang:1.16 as build
FROM golang:1.19 as build

WORKDIR /build

Expand All @@ -21,7 +21,7 @@ RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build \
-ldflags "-s -w -X github.com/nlnwa/gowarcserver/cmd/version.Version=${VERSION}"


FROM gcr.io/distroless/base
FROM gcr.io/distroless/base-debian11
COPY --from=build /build/gowarcserver /
EXPOSE 9999

Expand Down
26 changes: 0 additions & 26 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,29 +28,3 @@ following the steps described in
golangci-lint's [local installation](https://golangci-lint.run/usage/install/#local-installation) guide. Note that on **linux** the guide expects you to have `$GOPATH/bin` included in your `PATH` variable.

golangci-lint run -E "bodyclose" -E "dogsled" -E "durationcheck" -E "errorlint" -E "forcetypeassert" -E "noctx" -E "exhaustive" -E "exportloopref" --timeout 3m0s

## Configuration

gowarcserver can be configured with a config file, environment variables and flags. Flags has precedence over
environment variables that has precedence over config file entries. An environment variable match the uppercased flag
name with underscore in place of dash.

| Name | Type | Description | Default | Sub command |
| ------------- | ------------- | ----------- | ------- | ------- |
| config | string | Path to configuration file | ./config.yaml | global |
| log-level | string | Log level. Legal values are "trace" , "debug", "info", "warn" or "error" | "info" | global |
| port | int | Server port | 9999 | serve |
| index | bool | Enable indexing when running server | true | serve |
| watch | bool | Update index when files change | false | serve |
| log-requests | bool | Enable request logging | false | serve |
| dirs | list of paths | Comma separated list of directories to index | ["."] | index, serve |
| db-dir | path | Location of index database | "." | index, serve |
| max-depth | int | Maximum index recursion depth | 4 | index, serve |
| include | list of suffixes | Only index files that match one of these suffixes | [] | index, serve |
| workers | int | Number of index workers | 8 | index, serve |
| compression | string | Database compression type. Legal values are: 'none', 'snappy', 'zstd' | "snappy" | index, serve |
| bloom | bool | Enable bloom filter when indexing with "toc" format | true | index |
| bloom-capacity | uint | Estimated bloom filter capacity | 1000 | index |
| bloom-fp | float64 | Estimated bloom filter false positive rate | 0.01 | index |
| child-urls | []string | Urls pointing to other gowarcserver processes running a server | [] | proxy |
| child-query-timeout | Duration | Child query timeout a request to a child can take before resulting in timeout | 300ms | proxy |
Loading