Skip to content

Conversation

@binary-husky
Copy link

Description

Support swanlab as logger. Current code works for me, but it creates two experiment seperately for trainer and explorer. May be this can be improved?

image

Checklist

Please check the following items before code is ready to be reviewed.

  • Code has passed all tests
  • Docstrings have been added/updated in Google Style
  • Documentation has been updated
  • Code is ready for review

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @binary-husky, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces comprehensive support for Swanlab as an experiment monitoring and logging solution. It provides a new "SwanlabMonitor" class that seamlessly integrates with existing monitoring infrastructure, allowing users to track various aspects of their machine learning experiments. The implementation includes robust handling for API key authentication, flexible configuration, and specialized logging for tabular data, ensuring a smooth experience for experiment tracking.

Highlights

  • Introduced Swanlab Integration: Added a new "SwanlabMonitor" class to integrate Swanlab as an experiment logger, enabling tracking of machine learning experiments.
  • Flexible API Key Handling: Implemented logic to read Swanlab API keys from environment variables or configuration, with a best-effort login attempt.
  • Enhanced Data Logging: Provided support for logging "pandas.DataFrame" objects as interactive ECharts Tables within Swanlab, with a fallback to CSV string logging.
  • Comprehensive Initialization: The "SwanlabMonitor" constructor now handles project, group, name, role, and configuration parameters, including dynamic tag composition and stripping of "None" values for clean "swanlab.init" calls.
  • Dedicated Smoke Test: A new smoke test file ("tests/utils/monitor_swanlab_test.py") has been added to verify the basic functionality and installation requirements of the "SwanlabMonitor".
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for SwanLab as a monitoring tool by adding a SwanlabMonitor class and a corresponding smoke test. The implementation is well-structured and follows the pattern of other monitors in the project. My review focuses on improving robustness and error handling, particularly by replacing silent exception handling (except: pass) with proper logging to avoid hiding potential issues with login, data logging, and run finalization.

Regarding your question about creating separate experiments for the trainer and explorer: this is happening because the experiment name is suffixed with the role (trainer or explorer). To merge them into a single experiment, you could potentially use a shared run id passed to swanlab.init via monitor_args. You would need to generate a unique ID for the job and configure both the trainer and explorer to use it. This would group their logs. However, the current approach of separate experiments is also valid, is safer regarding concurrency, and is consistent with how other monitors like MlflowMonitor are implemented here.

Comment on lines +257 to +259
except Exception:
# Best-effort login; continue to init which may still work if already logged in
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Swallowing exceptions with except Exception: pass is risky, as it can hide important issues during login, such as an invalid API key or network problems. This could lead to silent failures where no data is logged. It's better to log a warning that the login attempt failed.

            except Exception as e:
                # Best-effort login; continue to init which may still work if already logged in
                get_logger(__name__).warning(f"Swanlab login failed, but continuing initialization: {e}")

Comment on lines +333 to +341
try:
# Prefer run.finish() if available
if hasattr(self, "logger") and hasattr(self.logger, "finish"):
self.logger.finish()
else:
# Fallback to global finish
swanlab.finish()
except Exception:
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Silently ignoring exceptions during close() with except Exception: pass can lead to data loss if swanlab.finish() fails to upload pending data. It's critical to log any errors that occur during this finalization step.

    def close(self) -> None:
        try:
            # Prefer run.finish() if available
            if hasattr(self, "logger") and hasattr(self.logger, "finish"):
                self.logger.finish()
            elif swanlab:
                # Fallback to global finish
                swanlab.finish()
        except Exception as e:
            logger = getattr(self, "console_logger", get_logger(__name__))
            logger.warning(f"Error closing Swanlab monitor: {e}")

Comment on lines +58 to +61
try:
mon.close()
except Exception:
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Swallowing exceptions with except Exception: pass during cleanup in a test's error path can hide underlying issues in the close() method. Even though the test is already considered failed, logging the cleanup error can provide valuable debugging information.

			try:
				mon.close()
			except Exception as close_e:
				print(f"[cradle] Error closing monitor during error handling: {close_e}")

Comment on lines +251 to +253
api_key = (
monitor_args.get("api_key") or os.environ.get("SWANLAB_API_KEY")
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current implementation only checks for the SWANLAB_API_KEY environment variable. However, the smoke test for this monitor (tests/utils/monitor_swanlab_test.py) checks for a wider range of keys (SWANLAB_APIKEY, SWANLAB_KEY, SWANLAB_TOKEN). To improve robustness and align with potential swanlab conventions, consider checking for these other environment variables as well.

        api_key = monitor_args.get("api_key")
        if not api_key:
            for key in ("SWANLAB_API_KEY", "SWANLAB_APIKEY", "SWANLAB_KEY", "SWANLAB_TOKEN"):
                api_key = os.environ.get(key)
                if api_key:
                    break

Comment on lines +291 to +305
if config is not None:
if hasattr(config, "flatten"):
try:
cfg_dict = config.flatten()
except Exception:
# Fallback: try to cast to dict if possible
try:
cfg_dict = dict(config)
except Exception:
cfg_dict = None
else:
try:
cfg_dict = dict(config)
except Exception:
cfg_dict = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for converting the config object to a dictionary is quite complex with nested try-except blocks that silently ignore failures. This can be simplified and made more informative by logging a warning if the config conversion fails.

        if config is not None:
            try:
                if hasattr(config, "flatten"):
                    cfg_dict = config.flatten()
                else:
                    cfg_dict = dict(config)
            except Exception as e:
                get_logger(__name__).warning(f"Could not convert config to a dictionary for SwanLab: {e}")

Comment on lines +321 to +324
except Exception:
# Fallback: log as CSV string if echarts table is unavailable
csv_str = experiences_table.to_csv(index=False)
swanlab.log({table_name: csv_str}, step=step)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The fallback to logging a table as a CSV string is a good defensive measure. However, it happens silently. It would be beneficial to log a warning when this fallback is triggered, so the user is aware that the richer ECharts table could not be created and can investigate the root cause.

        except Exception as e:
            self.console_logger.warning(
                f"Failed to log table '{table_name}' as SwanLab ECharts Table, falling back to CSV. Error: {e}"
            )
            # Fallback: log as CSV string if echarts table is unavailable
            csv_str = experiences_table.to_csv(index=False)
            swanlab.log({table_name: csv_str}, step=step)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant