- 
                Notifications
    
You must be signed in to change notification settings  - Fork 41
 
Support swanlab as logger #359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
          Summary of ChangesHello @binary-husky, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces comprehensive support for Swanlab as an experiment monitoring and logging solution. It provides a new "SwanlabMonitor" class that seamlessly integrates with existing monitoring infrastructure, allowing users to track various aspects of their machine learning experiments. The implementation includes robust handling for API key authentication, flexible configuration, and specialized logging for tabular data, ensuring a smooth experience for experiment tracking. Highlights
 Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either  
 Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a  Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
  | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces support for SwanLab as a monitoring tool by adding a SwanlabMonitor class and a corresponding smoke test. The implementation is well-structured and follows the pattern of other monitors in the project. My review focuses on improving robustness and error handling, particularly by replacing silent exception handling (except: pass) with proper logging to avoid hiding potential issues with login, data logging, and run finalization.
Regarding your question about creating separate experiments for the trainer and explorer: this is happening because the experiment name is suffixed with the role (trainer or explorer). To merge them into a single experiment, you could potentially use a shared run id passed to swanlab.init via monitor_args. You would need to generate a unique ID for the job and configure both the trainer and explorer to use it. This would group their logs. However, the current approach of separate experiments is also valid, is safer regarding concurrency, and is consistent with how other monitors like MlflowMonitor are implemented here.
| except Exception: | ||
| # Best-effort login; continue to init which may still work if already logged in | ||
| pass | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Swallowing exceptions with except Exception: pass is risky, as it can hide important issues during login, such as an invalid API key or network problems. This could lead to silent failures where no data is logged. It's better to log a warning that the login attempt failed.
            except Exception as e:
                # Best-effort login; continue to init which may still work if already logged in
                get_logger(__name__).warning(f"Swanlab login failed, but continuing initialization: {e}")| try: | ||
| # Prefer run.finish() if available | ||
| if hasattr(self, "logger") and hasattr(self.logger, "finish"): | ||
| self.logger.finish() | ||
| else: | ||
| # Fallback to global finish | ||
| swanlab.finish() | ||
| except Exception: | ||
| pass | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Silently ignoring exceptions during close() with except Exception: pass can lead to data loss if swanlab.finish() fails to upload pending data. It's critical to log any errors that occur during this finalization step.
    def close(self) -> None:
        try:
            # Prefer run.finish() if available
            if hasattr(self, "logger") and hasattr(self.logger, "finish"):
                self.logger.finish()
            elif swanlab:
                # Fallback to global finish
                swanlab.finish()
        except Exception as e:
            logger = getattr(self, "console_logger", get_logger(__name__))
            logger.warning(f"Error closing Swanlab monitor: {e}")| try: | ||
| mon.close() | ||
| except Exception: | ||
| pass | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Swallowing exceptions with except Exception: pass during cleanup in a test's error path can hide underlying issues in the close() method. Even though the test is already considered failed, logging the cleanup error can provide valuable debugging information.
			try:
				mon.close()
			except Exception as close_e:
				print(f"[cradle] Error closing monitor during error handling: {close_e}")| api_key = ( | ||
| monitor_args.get("api_key") or os.environ.get("SWANLAB_API_KEY") | ||
| ) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation only checks for the SWANLAB_API_KEY environment variable. However, the smoke test for this monitor (tests/utils/monitor_swanlab_test.py) checks for a wider range of keys (SWANLAB_APIKEY, SWANLAB_KEY, SWANLAB_TOKEN). To improve robustness and align with potential swanlab conventions, consider checking for these other environment variables as well.
        api_key = monitor_args.get("api_key")
        if not api_key:
            for key in ("SWANLAB_API_KEY", "SWANLAB_APIKEY", "SWANLAB_KEY", "SWANLAB_TOKEN"):
                api_key = os.environ.get(key)
                if api_key:
                    break| if config is not None: | ||
| if hasattr(config, "flatten"): | ||
| try: | ||
| cfg_dict = config.flatten() | ||
| except Exception: | ||
| # Fallback: try to cast to dict if possible | ||
| try: | ||
| cfg_dict = dict(config) | ||
| except Exception: | ||
| cfg_dict = None | ||
| else: | ||
| try: | ||
| cfg_dict = dict(config) | ||
| except Exception: | ||
| cfg_dict = None | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic for converting the config object to a dictionary is quite complex with nested try-except blocks that silently ignore failures. This can be simplified and made more informative by logging a warning if the config conversion fails.
        if config is not None:
            try:
                if hasattr(config, "flatten"):
                    cfg_dict = config.flatten()
                else:
                    cfg_dict = dict(config)
            except Exception as e:
                get_logger(__name__).warning(f"Could not convert config to a dictionary for SwanLab: {e}")| except Exception: | ||
| # Fallback: log as CSV string if echarts table is unavailable | ||
| csv_str = experiences_table.to_csv(index=False) | ||
| swanlab.log({table_name: csv_str}, step=step) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fallback to logging a table as a CSV string is a good defensive measure. However, it happens silently. It would be beneficial to log a warning when this fallback is triggered, so the user is aware that the richer ECharts table could not be created and can investigate the root cause.
        except Exception as e:
            self.console_logger.warning(
                f"Failed to log table '{table_name}' as SwanLab ECharts Table, falling back to CSV. Error: {e}"
            )
            # Fallback: log as CSV string if echarts table is unavailable
            csv_str = experiences_table.to_csv(index=False)
            swanlab.log({table_name: csv_str}, step=step)
Description
Support swanlab as logger. Current code works for me, but it creates two experiment seperately for trainer and explorer. May be this can be improved?
Checklist
Please check the following items before code is ready to be reviewed.