Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing Support for arbitary dtypes and frames dimension #25

Open
pokecheater opened this issue Nov 11, 2021 · 2 comments
Open

Missing Support for arbitary dtypes and frames dimension #25

pokecheater opened this issue Nov 11, 2021 · 2 comments

Comments

@pokecheater
Copy link

Hey ho imgaugment Team :),

When I used your component It is not possible to use a dtype other than uint8. Is that correct, and why is that? I see no reason for this restriction since most augmenters like skewing or rotating would work just fine with other dtypes.

Next question how can I augment multi frame images? With a shape: [frames, height, width, channels]. If I use the augmenters like that the frames will be have each it's own augmention applied and therefore the images are not augmented in the same manner.

Greetings and thanks in advance :)

@pokecheater
Copy link
Author

By the way the information I referenced above
I have from this page: https://imgaug.readthedocs.io/en/latest/source/examples_basics.html
image

@pokecheater
Copy link
Author

pokecheater commented Nov 17, 2021

To overcome problem number 2: I found an non intuitive workaround.
The solution is to interpret the first dimension also as channels. To do so I move the frames into the channel dimension:

This is done by moveaxis followed by a reshape (wheras the moved frame dimension will be combined with the given channel dimension)

image = np.moveaxis(image_origin, 0, -1)
image = image.reshape(
    image.shape[0],
    image.shape[1],
    image.shape[2] * image.shape[3]
)

For example:
if my image had the shape [3, 1024, 1024, 1] (3 Frames, height and width both 1024 pixel, and 1 channel) this lead first to [1024, 1024, 1, 3] and afterward to [1024, 1024, 3].

Since my frames are now inside the channel dimension the applied augmention over all channels are done with the same augmention values and my frames are all transformed at the same rate.
image_aug, polygons_aug = augmentation_flow(images=[image], polygons=polygons)

Afterwards, all I have to do is to revert that frames transformation (lucky as I am I stored the PIL image information in an img_meta dictionary).

  image_aug = image_aug[0].reshape(
      img_meta["size"][0],
      img_meta["size"][1],
      img_meta["channels"],
      img_meta["frames"],
  )

  image_aug = np.moveaxis(image_aug, -1, 0)

Works like a charm so far. :)

But the first problem with the different dtypes still persists. I am using uint16 images and the condition that all iamges must have numpy's dtype uint8 violates that. So why is it necessary to use uint8? I can not see any logical reason for this.

Thx in Advance :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant