Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question of data format explanation #88

Open
LL0912 opened this issue May 9, 2022 · 1 comment
Open

Question of data format explanation #88

LL0912 opened this issue May 9, 2022 · 1 comment
Assignees
Labels
question Further information is requested

Comments

@LL0912
Copy link

LL0912 commented May 9, 2022

Hello, I am trying to use the dataset. I have downloaded the dataset from Zenodo. However, I found that there is no explanation of the data format, such as the meaning of the name of each file in the"features" and dictionary's keys in the "labels.geojson" . I can only guess the meaning by codes. How can I get the official explanation of the dataset including the filename and so on. Can you help me?

@gabrieltseng
Copy link
Collaborator

Hi there!

Apologies for the delayed reply. I'll add this to the main README but in the meantime:

labels.geojson

>>> import geopandas
>>> labels = geopandas.read_file("labels.geojson")
>>> labels.columns
Index(['harvest_date', 'planting_date', 'label', 'classification_label',
       'index', 'is_crop', 'lat', 'lon', 'dataset', 'collection_date',
       'export_end_date', 'is_test', 'geometry'],
      dtype='object')

There are two types of columns; RequiredColumns which must be filled for all rows, and NullableColumns, which can have null values (see here).

Required Columns
  • index - the index of the row
  • is_crop - a boolean indicating whether or not the point being described contains cropland or not (at the date described by export_end_date
  • lat - the latitude of the point
  • lon - the longitude of the point
  • dataset - the dataset which the point comes from
  • collection_date - the date at which the point was collected
  • export_end_date - we collect a year of data for each point - this value defines the last month for which data is exported (and therefore the entire timeseries, since we will collect data for a year up to that point).
  • geometry - the geometry of the point. This may be a polygon (in which case lat/lon will be the central point of that field) or a point
  • is_test - a boolean indicating whether or not the point is part of the test data
Nullable columns
  • harvest_date - the harvest date of the crop described at the lat/lon
  • planting_date - the planting date of the crop described at the lat/lon
  • label - the label - this will be the higher level agricultural land cover label describing the land use at the lat/lon for the given export_end_date
  • classification_label - the higher level classification of label, defined by the FAO's indicative crop classification (i.e. if a row has a label="maize", then it would have classification_label="cereals"

features

All features have the following naming convention: {index}_{dataset}.h5 - where these two values are defined above. So each feature is associated with a row in the labels.geojson.

We are currently in the process of changing this convention so that names are instead in a f"min_lat={min_lat}_min_lon={min_lon}_max_lat={max_lat}_max_lon={max_lon}_dates={start_date}_{end_date}_all" format.

Let me know if I can provide any further clarifications!

@gabrieltseng gabrieltseng self-assigned this Jun 9, 2022
@gabrieltseng gabrieltseng added the question Further information is requested label Oct 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants