Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] logging schema #1

Closed
wants to merge 24 commits into from
Closed

[WIP] logging schema #1

wants to merge 24 commits into from

Conversation

agola11
Copy link
Owner

@agola11 agola11 commented Dec 20, 2022

See temporary script tests/unit_tests/logging/agent_log.py as a demonstration for how persisted trace logging could work:

langchain.logging.get_logger() returns shared logger instance

Output of tracing as json:

https://gist.github.com/agola11/ab7d5276973ccc6d6e0bc68a30bb119a

Next steps:

  • Wrap this logic in an easy to use logger interface
  • Create example script to demonstrate usage
  • Add log retrieval methods
  • Clean up PR
  • Add tests
  • Think about how to support parallel logging runs
  • Work on any other suggestions
  • Discuss strategy for logging

Copy link

@ShreyaR ShreyaR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Some high level comments:

  1. What are your thoughts on how to create runs into the Chain/LLM/Tool class? One way to do this could be to add an optional RunLogger callback for each class that creates the relevant Run and logs it to data storage.
  2. Prompts and the generated output could potentially be pretty large, and is most probably going to be the main memory consumer for the DB. Wdyt about exploring NoSQL options for data storage? If retaining the relational structure is desirable, then another alternative (maybe for the future) is storing the inputs/outputs in object storage, and storing only the object storage URIs in the DB.

success = Column(Boolean, default=True)
extra = Column(JSON, default={})
execution_order = Column(Integer, default=1)
serialized = Column(JSON)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Some thoughts & questions about the RunMixin schema:

  1. Would be great to add in duration and error_msg fields to the schema.
  2. What's the scope of execution_order? Are runs expected to have globally unique execution_orders, or is it going to be unique for runs of one chain/tool/llm object?
  3. How do you plan to propagate success to parent runs?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Yup! Was thinking of editing schema to add start_time and end_time, still thinking through the error logging stuff, but yes we'll need an error field
  2. The latter. Execution order only comes into play when there are nested runs (i.e. LLMChain -> LLM) and specifies the order within the nested run.
  3. The parent run will be successful iff all of the children run succeed. By having each child run having its own success field, it's easy to see exactly which parts of a long nested chain succeed and which ones fail. Hoping to build out the logger abstraction a bit more today to make this more clear.

@hwchase17
Copy link
Collaborator

we are probably going to have a concept of a prompt template and a prompt store to hold those templates/versions - how do these logs interact with that?

also, i think the notion of "projects" or something may be useful - eg as we have multiple people doing multiple runs, i imagine one common viewing pattern would be "give me all logs for this experiment i ran", so the ability to easily separate logs by projects/tags seems useful

additionally, there is probably additional metadata that people may want to log - eg the user id for the user input, the session id (if there is a concept of sessions that have multiple chain runs could be useful to group those) - probably should figure out (1) which, if any, we want to make columns vs just putting in some standard metadata field, (2) how to pass and store that metadata field

i see you put some logging in the search api, im assuming that is representative of the tool - the issue with that is that tools can currently just be a user defined function. i know two options we had talked about was (1) having a Tool interface, (2) just putting the logging in the agent. do you have any updated thoughts on this

@hwchase17
Copy link
Collaborator

Note that an agent is just a chain that repeatedly invokes an LLMChain and Tools

this is true but its missing some of the nuance in that its invoking an LLMChain (potentially multiple times) in order to get a specifically formatted otuput that it can parse into a "Action" and "Action Input", and it then recieves an observation. There is sneaky a lot of logic in the agent abstraction (besides just a single LLM call) like multiple LLM calls, parsing, etc. and in many ways the fact that the agent is just an LLMChain to decide what to do is an implementation detail - in future could be multiple LLMchains, etc.

i think we could probably benefit from trying to surface this structure a bit more in our logs. Again, maybe this is just naming and calling it an AgentActionLog instead of tool log or something

@hwchase17
Copy link
Collaborator

some additional context (which i think i told @agola11 but maybe not @ShreyaR ):

currently there is a logging abstraction in LangChain https://github.com/hwchase17/langchain/blob/master/langchain/logger.py with a StdOut logger implementation (eg just printing)

the reason this abstraction exists, and with the given interface, is i was working with sam whitmore and she basically needed some logging functionality to use langchain so i added it in. note that the logging abstraction for the llm (inputs, prompt after formatting, outputs) is EXACTLY the schema she was using to log things (the agent format was made up, but was one that played nicely with how i wanted to print things.

and then they way she used this was:

import contextlib
CONTEXT = {}

@contextlib.contextmanager
def set_context(**kwargs):
    global CONTEXT
    CONTEXT = kwargs
    yield
    CONTEXT = {}

from langchain.logger import BaseLogger

class TestLogger(BaseLogger):
    def log_llm_inputs(self, inputs: dict, prompt: str, **kwargs):
        global CONTEXT
        print(CONTEXT["id"])

import langchain
langchain.logger = TestLogger()

with set_context(id="foo"):
    react.run(question)

TestLogger she filled with logic to log to her DB, but the core pieces are:

import langchain
langchain.logger = TestLogger()

set the logger globally, so you dont have to pass it to every chain

with set_context(id="foo"):
    react.run(question)

this was the method we used to pass context (id, user ids, session id, etc) around - put them in a global variable for the scope of that run. this has the nice benefit of not having to remember to pipe this context through all the places in the chain (which can be very easy to forget to do and is a pain on developers and complicates the interfaces a good bit).

we can obviously change this up, and should think about doing so, but i think some learnings were: (1) she really liked this logging format for llm calls, eg having the llm inputs/formatter prompt/output all in one place was super helpful (that being said, she was viewing the raw logs, so if we added some ui that combined all this into one maybe that's resolved). (2) passing loggers/context through all the chains, and remembering to do so, was a medium-large pain and so setting them globally was helpful for that

@agola11
Copy link
Owner Author

agola11 commented Dec 20, 2022

Looks great! Some high level comments:

  1. What are your thoughts on how to create runs into the Chain/LLM/Tool class? One way to do this could be to add an optional RunLogger callback for each class that creates the relevant Run and logs it to data storage.
  2. Prompts and the generated output could potentially be pretty large, and is most probably going to be the main memory consumer for the DB. Wdyt about exploring NoSQL options for data storage? If retaining the relational structure is desirable, then another alternative (maybe for the future) is storing the inputs/outputs in object storage, and storing only the object storage URIs in the DB.

@ShreyaR:

  1. Going to toy around with this today. I think the easiest approach would be to have to calls to logger: one at the beginning and one at the end of each __call__ function for LLM and Chain, and then within Agent._call for tool.
  2. Yup, definitely. The plan is to store the raw prompts in objects or blob storage somewhere, then include the handles in the DB. NoSQL could work too, but I think relational is a bit better because it 1) makes it easy to to fetch individual runs without fetching the entire nested composition and processing it (i.e. more granular data access), 2) enforces a standard schema/field 3) just better support for transactions/rollbacks and safeguards in general

@agola11
Copy link
Owner Author

agola11 commented Dec 20, 2022

we are probably going to have a concept of a prompt template and a prompt store to hold those templates/versions - how do these logs interact with that?

also, i think the notion of "projects" or something may be useful - eg as we have multiple people doing multiple runs, i imagine one common viewing pattern would be "give me all logs for this experiment i ran", so the ability to easily separate logs by projects/tags seems useful

additionally, there is probably additional metadata that people may want to log - eg the user id for the user input, the session id (if there is a concept of sessions that have multiple chain runs could be useful to group those) - probably should figure out (1) which, if any, we want to make columns vs just putting in some standard metadata field, (2) how to pass and store that metadata field

i see you put some logging in the search api, im assuming that is representative of the tool - the issue with that is that tools can currently just be a user defined function. i know two options we had talked about was (1) having a Tool interface, (2) just putting the logging in the agent. do you have any updated thoughts on this

See my response to Shreya's comment: the prompt template is part of the LLM which would be part of the serialized objects I assume. This and the prompt can be stored in blob storage, and the handle can be included in the DB.

All of the things you mentioned, i.e. projects, sessions, users etc can all be stored as part of the data model and have relationships to the Run entities in the DB.

I'm leaning towards just putting the logging in Agent._call. Will play around with it today.

@agola11
Copy link
Owner Author

agola11 commented Dec 20, 2022

this is true but its missing some of the nuance in that its invoking an LLMChain (potentially multiple times) in order to get a specifically formatted otuput that it can parse into a "Action" and "Action Input", and it then recieves an observation. There is sneaky a lot of logic in the agent abstraction (besides just a single LLM call) like multiple LLM calls, parsing, etc. and in many ways the fact that the agent is just an LLMChain to decide what to do is an implementation detail - in future could be multiple LLMchains, etc.

i think we could probably benefit from trying to surface this structure a bit more in our logs. Again, maybe this is just naming and calling it an AgentActionLog instead of tool log or something

Gotcha, yeah, we can do some renaming to better highlight this abstraction. Can also include a field in ChainRun to indicate whether the chain is running as an agent or not. Let's talk about this more when we sync.

Copy link
Collaborator

@hwchase17 hwchase17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any value in passing the id of the llm/chain/agent to the log_*_end functions and validating the id matches the one in the stack?

@@ -45,7 +46,10 @@ def generate(
) -> LLMResult:
"""Run the LLM on the given prompt and input."""
if langchain.llm_cache is None:
return self._generate(prompts, stop=stop)
get_logger().log_llm_run_start({"name": self.__class__.__name__}, prompts)
output = self._generate(prompts, stop=stop)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we also call generate down below if cache exists but some things arent in cache, will want to put there as well

def _end_log_run(self) -> None:
"""Call at the end of a run."""

self._execution_order += 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so execution order is based on the ending? wouldnt it be more intuitive to have it based on the start?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think either one works

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, execution order in this convention is helpful for pretty printing in the correct order, so I'd like to keep it this way

@agola11
Copy link
Owner Author

agola11 commented Dec 21, 2022

is there any value in passing the id of the llm/chain/agent to the log_*_end functions and validating the id matches the one in the stack?

The id actually isn't generated until the session gets committed, plus id would be another parameter that the user has to keep track of and pass in to end

Copy link
Collaborator

@hwchase17 hwchase17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay some comments on usage pattern:

1 (nit): seems like some stuff is slightly off on the return? outputs has an outputs key, but inputs doesnt?

Screen Shot 2022-12-21 at 5 21 40 PM

  1. get_chain_runs returns all chain runs regardless of whether they are at the same level or not. eg i run it once and that function returns 2 chain runs (because one is a sub run of another). i think my query patterns will be: (a) show me all chain runs at the same level, or (b) show me all chain runs (and just all invocations in general in this particular chain run)

  2. more of a structure nit, but get_chains runs probably doesnt belong on the logger (not the job of the logger to fetch results)

  3. would be good to figure out what the base logger abstraction should be and get that into langchain, so we can just iterate on the logging class but use the up to date langchain codebase (already out of date a bit)

@hwchase17
Copy link
Collaborator

comment #2 probalby the only big blocker in the short term


@abstractmethod
def log_llm_run_start(
self, serialized: Dict[str, Any], prompts: List[str], **extra: str
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is Dict[str, Any] the best form?

what goes in extra?

Copy link
Owner Author

@agola11 agola11 Dec 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what the best type is for serialized objects. I was assuming they get serialized as json, in which case Dict[str, Any] could work.

Extra is for any extra fields that you might want to store. Left it in there for added flexibility, but it's not used rn

"""Log the start of an LLM run."""

@abstractmethod
def log_llm_run_end(self, response: Dict[str, Any], error=None) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

practically, is error ever used currently

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure response is the best type, should just be LLMResult?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error is not used rn, but in the future, it would be good to log the error of a particular run if it didn't succeed.

Agree on LLMResult

self,
serialized: Dict[str, Any],
action: str,
inputs: Dict[str, Any],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right now, tool inputs/outputs are assumed to be a single string rather than multiple things. if we want to make multiple here, we should change in code as well

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I'll change to single string

@agola11 agola11 closed this Dec 23, 2022
@agola11 agola11 reopened this Dec 23, 2022
@agola11 agola11 closed this Dec 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants