Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Semantic Labeling #203

Draft
wants to merge 33 commits into
base: ros2-devel
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
8ed6683
init
sriramk117 Aug 7, 2024
dcb74b7
added neccessary callback functions
sriramk117 Aug 11, 2024
c611e78
implemented functionality to run sam and groundingdino
sriramk117 Aug 13, 2024
4fb0258
wrote vision pipeline and execute callback
sriramk117 Aug 14, 2024
5d81216
created result message returned by vision pipeline
sriramk117 Aug 14, 2024
bfcaeaa
modified launch file and created yaml file for parameters
sriramk117 Aug 14, 2024
19e9275
updated setup.py and modified parameters
sriramk117 Aug 16, 2024
c29520d
Merge branch 'ros2-devel' into sriramk/semantic-labeling
sriramk117 Aug 16, 2024
4e7391f
added requirements to install and fixed imports
sriramk117 Aug 16, 2024
9c43fc4
changed grounding dino path and added checkpoint
sriramk117 Sep 12, 2024
31551e4
Added config file + fixed image transformations
sriramk117 Sep 12, 2024
3ec7b50
Added GroundingDINO visualization function
sriramk117 Sep 14, 2024
9dc9a40
created GroundingDINO publisher for testing
sriramk117 Sep 16, 2024
929e570
added more testing code for bbox visualization
sriramk117 Sep 16, 2024
e1ebf8b
fixed groundingdino results visualization
sriramk117 Sep 18, 2024
704caa1
corrected image preprocessing?
sriramk117 Sep 19, 2024
4f9305d
groundingdino works!
sriramk117 Sep 23, 2024
024c71c
masks are now displayable
sriramk117 Sep 23, 2024
c78cd4a
record vision pipeline inference time
sriramk117 Sep 24, 2024
e503800
wrote code to generate mask messages during action calls
sriramk117 Sep 27, 2024
648a46e
masks msgs are generated but action keeps aborting
sriramk117 Sep 30, 2024
3032f65
Added gpt-4o query functionality
sriramk117 Nov 7, 2024
85f9577
groundingdino can be downloaded via github url
Nov 8, 2024
0049598
updated comments/code quality changes
Nov 8, 2024
e9fd4d5
invoking gpt-4o has been transformed into a service
sriramk117 Nov 8, 2024
9d52d98
segment all items action now takes a single string as input
sriramk117 Nov 8, 2024
30bc036
added env variables
sriramk117 Nov 9, 2024
4d3b27c
environment variables not loading?
sriramk117 Nov 9, 2024
94af48e
ran black formatter
sriramk117 Nov 9, 2024
29ed345
Merge branch 'ros2-devel' into sriramk/semantic-labeling
sriramk117 Nov 9, 2024
23577ae
changes to segmentallitems node initializing it as a perception node
sriramk117 Nov 9, 2024
3688541
fixed error of topics not being received by segmentallitems action
sriramk117 Nov 9, 2024
195b123
code cleanup
Nov 9, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,9 @@
build/
__pycache__/

# Environment Variables file
.env

Comment on lines +5 to +7
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a .env file added in this PR, but I'm guessing this was more for personal use. I'd recommend omitting this change unless it's relevant for the functionality of the PR.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed more references to a env file later in the code, where exactly does this come into play?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm adding environment variable functionality to our codebase so we can privately store API keys without exposing them publicly in github. In this particular case, it is for accessing the PRL OpenAI API key to invoke GPT-4o.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming this may come in handy later on as well if we power perception w/ foundation models in the future.

# Compiled Object files
*.slo
*.lo
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -336,7 +336,7 @@ def update(self) -> py_trees.common.Status:
x_unit.vector, x_pos.vector
)

# # If you need to send a fixed food frame to the robot arm, e.g., to
# # If you need to send a fixed food frame to the robot arm, e.g., to
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unintended?

# # debug off-centering issues, uncomment this and modify the translation.
# deg = 90 # fork roll
# world_to_food_transform.transform.translation.x = 0.26262263022586224
Expand Down
1 change: 1 addition & 0 deletions ada_feeding_msgs/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ rosidl_generate_interfaces(${PROJECT_NAME}
"srv/AcquisitionReport.srv"
"srv/AcquisitionSelect.srv"
"srv/GetRobotState.srv"
"srv/GenerateCaption.srv"
"srv/ModifyCollisionObject.srv"

DEPENDENCIES geometry_msgs sensor_msgs std_msgs
Expand Down
5 changes: 5 additions & 0 deletions ada_feeding_msgs/action/SegmentAllItems.action
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# The interface for an action that gets an image from the camera and returns
# the masks of all segmented items within that image.

# The list of input semantic labels for the food items on the plate
string caption
Comment on lines +4 to +5
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment seems misleading, I suspect this was an old comment for item_labels

---
# Possible return statuses
uint8 STATUS_SUCCEEDED=0
Expand All @@ -17,6 +19,9 @@ std_msgs/Header header
sensor_msgs/CameraInfo camera_info
# Masks of all the detected items in the image
ada_feeding_msgs/Mask[] detected_items
# A list of semantic labels corresponding to each of the masks of detected
# items in the image
string[] item_labels
---
# How much time the action has spent segmenting the food item
builtin_interfaces/Duration elapsed_time
3 changes: 3 additions & 0 deletions ada_feeding_msgs/msg/Mask.msg
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,9 @@ float64 average_depth
# An arbitrary ID that defines the segmented item
string item_id

# An ID that semantically labels a specific, segmented item
string object_id

# A score that indicates how confident the segemntation algorithm is in
# this mask.
float64 confidence
11 changes: 11 additions & 0 deletions ada_feeding_msgs/srv/GenerateCaption.srv
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# The interface for a service that takes in a list of input labels
# describing the food items on a plate and returns a sentence caption compiling
# these labels used as a query for GroundingDINO detection.

# A list of semantic labels corresponding to each of the masks of detected
# items in the image
string[] input_labels
---
# A sentence caption compiling the semantic labels used as a query for
# GroundingDINO to perform bounding box detections.
string caption
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: add newlines at the end of files. (I know not all files have it, but in general it is a best practice so we should enforce it on new/modified files)

Original file line number Diff line number Diff line change
Expand Up @@ -170,6 +170,7 @@ def main(args=None):
# pylint: disable=import-outside-toplevel
from ada_feeding_perception.face_detection import FaceDetectionNode
from ada_feeding_perception.food_on_fork_detection import FoodOnForkDetectionNode
from ada_feeding_perception.segment_all_items import SegmentAllItemsNode
from ada_feeding_perception.segment_from_point import SegmentFromPointNode
from ada_feeding_perception.table_detection import TableDetectionNode

Expand All @@ -178,6 +179,7 @@ def main(args=None):
node = ADAFeedingPerceptionNode("ada_feeding_perception")
face_detection = FaceDetectionNode(node)
food_on_fork_detection = FoodOnForkDetectionNode(node)
segment_all_items = SegmentAllItemsNode(node) # pylint: disable=unused-variable
segment_from_point = SegmentFromPointNode(node) # pylint: disable=unused-variable
table_detection = TableDetectionNode(node)
executor = MultiThreadedExecutor(num_threads=16)
Expand Down
Loading