Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add data endpoint that can parse webserver logs. #1483

Merged
merged 9 commits into from
Mar 30, 2021

Conversation

jpwhite4
Copy link
Member

This adds a couple of libraries that can parse webserver
log files, parse user agent strings and a geoip lookup library.

The geoip library is Apache Licenced, but the corresponding
geoip database is not redistributable so is not included.

The WebServerLogFile endpoint has configuration settings to
specify the log file format (and hence the records extracted) and
a config setting for the path to the goeip database. If there
is no geoip database then it will still work.

See the xdmod-ood module for CI tests and usage etc.

@jpwhite4 jpwhite4 added enhancement Enhancement of the functionality of an existing feature Category:ETL Extract Transform Load labels Jan 14, 2021
@jpwhite4 jpwhite4 added this to the 9.5.0 milestone Jan 14, 2021
@plessbd
Copy link
Contributor

plessbd commented Jan 14, 2021

for reference: https://github.com/jpwhite4/xdmod-ood
But there are no tests in there... Just the artifacts

@jpwhite4 jpwhite4 force-pushed the weblogs branch 2 times, most recently from 996defa to 8e71938 Compare January 29, 2021 15:51
This adds a couple of libraries that can parse webserver
log files, parse user agent strings and a geoip lookup library.

The geoip library is Apache Licenced, but the corresponding
geoip database is not redistributable so is not included.

The WebServerLogFile endpoint has configuration settings to
specify the log file format (and hence the records extracted) and
a config setting for the path to the goeip database. If there
is no geoip database then it will still work.

The regular expression ingestor supports using the PHP
preg_filter() function to transform data.

See the xdmod-ood module for CI tests and usage etc.
Originally the code would nonly try to add location properties
if the geoip file was present. This changes it so that the properties
are always present even if a file is absent. This makes it easier
to support sites that don't have access to the location file.
@jpwhite4 jpwhite4 requested a review from jsperhac March 29, 2021 14:03
@jsperhac
Copy link
Contributor

I eyeballed this PR and approved it prior to merging with 9.5. Looks like there are some tests failing, however.

@ryanrath
Copy link
Contributor

@jsperhac I believe that the test failures are due to the chromium update that's breaking the report generator.

@jpwhite4 jpwhite4 merged commit 2f7fead into ubccr:xdmod9.5 Mar 30, 2021
@jpwhite4 jpwhite4 deleted the weblogs branch March 30, 2021 16:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Category:ETL Extract Transform Load enhancement Enhancement of the functionality of an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants