Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[processor/geoip] Add GeoIP attributes provider interface #33269

Closed
rogercoll opened this issue May 28, 2024 · 3 comments
Closed

[processor/geoip] Add GeoIP attributes provider interface #33269

rogercoll opened this issue May 28, 2024 · 3 comments
Labels
enhancement New feature or request processor/geoip

Comments

@rogercoll
Copy link
Contributor

rogercoll commented May 28, 2024

Component(s)

processor/geoip

Is your feature request related to a problem? Please describe.

The initial proposal of the GeoIP processor was to work with the MaxMind GeoLite2 database to retrieve geolocation metadata. After discussions with the community, it was determined that the processor should be agnostic to the geolocation metadata provider. Therefore, it not strictly require a MaxMind database nor a license. This issue aims to track the proposal and implementation of this interface.

Any provider should be able to provide all attributes defined in #32663

Describe the solution you'd like

The interface shouldn't be exported, implementations should remain within the collector, for example, geoiprocessor/internal/provider/geoipprovider.go:

// GeoIPProvider defines methods for obtaining the geographical location based on the provided IP address.
type GeoIPProvider interface {
	// Location returns a set of attributes representing the geographical location for the given IP address. It requires a context for managing request lifetime.
	Location(context.Context, net.IP) (attribute.Set, error)
}

Configuration

The user should be able to define multiple providers (e.g., for data fallback). The configuration could include a map of providers to be used:

type Config struct {
         // Providers specifies the sources to extract geographical information about a given IP (e.g., GeoLite2).
	Providers map[string]provider.Config
	...
}

Similar to other components, the processor's factory will have access to a factory of providers, enabling it to create them on demand.

Describe alternatives you've considered

Another option is to define a more fine-grained interface based on the different types of location attributes the processor can add, thus improving the performance when only a subset of attributes is desired. For example:

// GeoIPProvider defines methods for obtaining the geographical location based on the provided IP address.
type GeoIPProvider interface {
	// Country returns a set of attributes representing the country location for the given IP address.
	Country(context.Context, net.IP) (attribute.Set, error)
	// City returns a set of attributes representing the city location for the given IP address.
	City(context.Context, net.IP) (attribute.Set, error)
	// Isp returns a set of attributes representing the ISP for the given IP address.
	Isp(context.Context, net.IP) (attribute.Set, error)
	// Asn returns a set of attributes representing the AS for the given IP address.
	Asn(context.Context, net.IP) (attribute.Set, error)
	// Domain returns a set of attributes representing the domain for the given IP address.
	Domain(context.Context, net.IP) (attribute.Set, error)
}

I find this approach impractical due to the numerous methods involved, which some providers might not support for retrieving all attributes. This limitaion could be resolved by adding a filter or namespace parameter:

// GeoIPProvider defines methods for obtaining the geographical location based on the provided IP address.
type GeoIPProvider interface {
	// Location returns a set of attributes representing the geographical location for the given IP address. It requires a context for managing request lifetime. An optional list of namespaces can be provided to specify the desired attributes to retrieve (e.g city, country, asn, etc.).
	Location(context.Context, net.IP, ...string) (attribute.Set, error)
}

Additional context

PoC: https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/33268/files

Please don't hesitate to ask if you need any modifications or clarifications. Any feedback or suggestions are highly appreciated.

@rogercoll
Copy link
Contributor Author

Copy link
Contributor

Pinging code owners for processor/geoip: @andrzej-stencel @michalpristas @rogercoll. See Adding Labels via Comments if you do not have permissions to add labels yourself.

@crobert-1 crobert-1 removed the needs triage New item requiring triage label May 28, 2024
andrzej-stencel added a commit that referenced this issue Jul 8, 2024
…ry (#33268)

**Description:** Define and parse a configuration for the geo IP
providers section. In addition, the Maxmind GeoIPProvider implementation
is included in the providers factories.

Example configuration:

```yaml
processors:
  geoip:
    providers:
      maxmind:
        database: "/tmp"
```

Implementation details:
- Custom Unmarshal implementation for the processor's component, this is
needed to dynamically load each provider's configuration:
https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/33268/files#diff-aed2c6fd774ef54a3039647190c67e28bd0fc67e008fdd5630b6201c550bd00aR46
. The approach is very similar to how the [hostmetrics receiver
loads](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/hostmetricsreceiver/config.go#L44)
its scraper configuration.
- A new factory for the providers is included in order to retrieve their
default configuration and get the actual provider:
https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/33268/files#diff-2fbb171efac07bbf07c1bcb67ae981eb481e56491add51b6a137fd2c17eec9dcR27

**Link to tracking Issue:**
#33269

**Testing:** 

- Unit tests for the configuration unmarshall + factories
- A base mock structure is used through all the test files:
https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/33268/files#diff-28f4a173f1f4b5ccd3cf4c9f7f7b6bf864ef1567a28291322d7e94a9f63243aeR26
- Integration test to verify the behaviour of the processor + maxmind
provider
- The generation of the Maxmind databases has been moved to a [public
internal
package](https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/33268/files#diff-83fc0ce7aa1f0495b4f4e5d5aabc2918162fec31ad323cc417b3f8c8eb5a00bcR14)
as being used for the unit and integration tests. Should we upload the
assembled database files instead?

**Documentation:** TODO

---------

Co-authored-by: Andrzej Stencel <[email protected]>
@rogercoll
Copy link
Contributor Author

Implemented in #33451

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request processor/geoip
Projects
None yet
Development

No branches or pull requests

2 participants