Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Unifying Object Protocol in the Stack #4116

Closed
2 of 3 tasks
tqchen opened this issue Oct 13, 2019 · 8 comments
Closed
2 of 3 tasks

[RFC] Unifying Object Protocol in the Stack #4116

tqchen opened this issue Oct 13, 2019 · 8 comments

Comments

@tqchen
Copy link
Member

tqchen commented Oct 13, 2019

We start to introduce many useful reference-counted objects in the stack, including:

  • runtime::Object used by virtual machine, needed in runtime.
  • Node used to store ASTs.

These objects have quite a lot in common, as a matter of fact, the implementation
of Object and Node system is roughly the same as Node.
(or tag if we use the terminology tagged-union objects). Besides these two objects runtime::NDArray is also quite similar, and could potentially be unified.

This is an RFC to discuss for us to unify some of these(at least Node and Object) implementations into a single one. Doing so would give us a more unified runtime system that can help us to clearly define an extensive runtime for deep learning models.

It also allows us to have reusable containers such as Array and Map that can contain both runtime objects and Node.

Features and High-level Considerations

Features that benefits all objects

  • F1: reference counted object.
  • F2: self-managing (have a customized deleter) that can be called upon release.
  • F3: being future compatible to new allocation strategy(e.g. allocate object from a pool)
  • F4: runtime type casting: we want to be able to check the object type during runtime and cast them.
// Example of runtime type casting.
void Process(Expr ast) {
  if (const AddNode* ptr = ast.as<AddNode>()) {
    // Logic for add
  } else if (const SubNode* ptr = ast.as<SubNode>()) {
    // Logic for sub
  }
}

Features that needed by Node(AST), they are implemented by additional
virtual functions in the Node base class.

  • F5: Support dynamic type index allocation.
    For AST nodes, we want to add new ones as we go, we support that in Node
    by dynamically allocate the type index during runtime keyed by an unique type_key string.
  • F6: Support for recursive serialization and front-end access.
    We support serialization of arbitrary AST node via a visitor pattern that every node implements.

Features that needed by runtime Object. Note that one key requirement for runtime is minimum code size and dependency.

  • F7: static type index if possible. For a few core data structures, we want to statically
    decide their type_index as constant, to make sure we have the most efficient
    and minimum code.

Finally, we want to make objects close to POD-C types if possible. That will open path for more native language integrations in other languages(e.g. rust) that goes beyond a simple frontend API based integration. We can also start to think about introducing a DSL that allows us to generate the
implementation and code in different languages. Unifying the object protocol will open path towards
that goal, but automatic code generation for objects is beyond the scope of this RFC.

Object Protocol Proposal

This section summarizes the base object protocol proposal.
It contains three fields:

  • A type_index that can be used for runtime type checking(see more details about type checking in next section).
  • A 32 bit reference counter.
  • A deleter function
class Object {
 public:
  /*!
   * \brief Object deleter
   * \param self pointer to the Object.
   */
  typedef void (*FDeleter)(Object* self);
  /*!
   * Check if the object is an instance of TargetType.
   * \tparam TargetType The target type to be checked.
   * \return Whether the target type is true.
   */
  template<typename TargetType>
  inline bool IsInstance() const;

 protected:
  // The fields of the base object cell.
  /*! \brief Type index(tag) that indicate the type of the object. */
  uint32_t type_index_{0};
  /*! \brief The internal reference counter */
  RefCounterType ref_counter_{0};
  /*!
   * \brief deleter of this object to enable customized allocation.
   * If the deleter is nullptr, no deletion will be performed.
   * The creator of the object must always set the deleter field properly.
   */
  FDeleter deleter_ = nullptr;
};

The object header is in total 16 bytes. It is an intrusive pointer object with an additional deleter and type_index.

The existence of deleter_ gives more flexibility about where the object comes from. For example, if the object has a POD-C ABI signature, in theory we could allocate it from one DLL(or another language e.g. rust) pass to another dll, while still allows us to call the correct deleter.

Future Compatibility for Special Allocators

Although we do not do it now, we want to think about the future possibility of
introducing customized allocators. The following code gives an example of an object pool.

class ObjectPool {
 public:
   // make an new object.
   ObjectPtr<A> make();
   // release object to the pool.
   void Release(A* obj);
};

void Example(ObjectPool* pool) {
  // allocate from the pool
  ObjectPtr<A> ptr1 = pool->make();

  // Triggers ptr1.reset() when ptr1 goes out of scope.
  // instead of calling delete, we send the object goes back to obj_pool
  // Equivalent to calling: pool->Release(ptr1.get());
}

If there is a global singleton reference to the pool, we can simply set the deleter_ function to call CurrentPool()->Release. If the object pool is created during runtime, and we don't have a global singleton, we need one additional field in addition to the object, and make the object pool allocate the chunk instead.

template<typename A>
class ObjectChunk {
 public:
   A data;
   ObjectPool* pool_;
}

template<typename A>
void Deleter(Object* obj) {
  auto* ptr = static_cast<ObjectChunk<A> >(obj);
  ptr->pool_->Release(ptr);
}

Note that we only need this if we want the object to go back to a specific pool. That is why the pool_ field is not introduced in the Object protocol, but can be added by a subsequent implementation of object pool if necessary.

Type Hierachy and Runtime Type Checking

One of the design choices that can be discussed is how to implement the type hierachy and type checking. There are several ways to do so and each of them has pros and cons.
Specifically, how can we implement the IsInstance API.

// Example of usage of IsInstance

class A : public Object {
};
class BaseB : public Object {
};
class C: public BaseB {
};

void Example() {
  // type-erased object
  ObjectPtr<Object> aptr = make_object<A>();
  ObjectPtr<Object> cptr = make_object<C>();
  // easy, check if aptr->type_index_ == A::type_index()
  CHECK(aptr->IsInstance<A>());
  CHECK(cptr->IsInstance<C>());
  // harder cannot simply do equivalence checking.
  CHECK(cptr->IsInstance<BaseB>());
}

See the above code example, things are easy if we just care about
whether an object is a terminal(final) type.
We can just check the type index.
It becomes harder if we want to support checking of an intermediate
base class (BaseB) in the above case.

There are three choices:

  • A1: Introduce _type_child_slots attributes for BaseB, make sure all its children's type are in a fixed range.
    We can assign A::type_index = 1, BaseB::type_index=2, BaseB::type_child_slots=1, C::type_index=3
    Make sure the type indices in [2, 4) only contain sub-class of BaseB
    • Fast to type check, but has to declare the maximum number of children before-hand and pre-allocate the type index slots.
  • A2: Create a global type hierachy table, that has information about each type's parent type
    • Works for all cases, no need to pre-declare number of child classes.
    • Might need locking the global type table if it can be dynamically updated during runtime.
  • A3: Create a chain of vtable during type construction.
    • Used by current Node
    • Need to create vtable, or introduce virtual function to the Node.

A1 gives the fatest code, but requires additional information(pre-reserve the child slots). Which could be fine for a project with a fixed type hierachy, but might be painful to allow new plugins that contains new types. A2 is the most flexible one, but could be slow(due to access to global type table and lock). A3 needs virtual function and vtable support(means additional fields in the Object),
which may or may not be what we want(if we want to move toward pure C ABI).

My current take is to support a mix of A1 and A2: allow each type declare _type_child_slots, but also allows
additional children that overflows the slots. During type checking, we first check if the type_index is
within the range(fast path), if not, we go to the global type table(slower path for rare cases).
It would be great to get more thoughts on this matter.

Unifying NDArray as an Object

Another debatable point is whether we want to unify NDArray as an Object. The advantage is clear, we could reuse the containers(Array) and remove runtime VM wrapper. This however, will mean we need to change the signature of NArray, it also brings an interesting question about how to handle sub-classes of NDArray and type checks them. We could defer the decision to a later point.

Path for Upgrade

  • Introduce the new object protocol, use it for the old runtime::Object.
  • Redirect Node to be sub-class of the new Object, redirect the functions.
  • Finish the tests and remove redundant API if any.
@tqchen
Copy link
Member Author

tqchen commented Oct 13, 2019

@tqchen
Copy link
Member Author

tqchen commented Oct 13, 2019

PR for the new object impl #4115

@kevinthesun
Copy link
Contributor

Just curious, is this change also related to relay/tvm node system unification?

@tqchen
Copy link
Member Author

tqchen commented Oct 13, 2019

The change is mainly need for runtime object infrastructure it unifies relay vm's object with the tvm's AST node. The unification of relay.module and tvm.module would be in a separate topic, but this PR is a step towards that

@icemelon
Copy link
Member

icemelon commented Oct 14, 2019

I have one question about _type_child_slots. If the child class is a base class for others and also defines _type_child_slots, what will happen if it overflows its parent _type_child_slots or the parent class doesn't specify _type_child_slots?

@tqchen
Copy link
Member Author

tqchen commented Oct 14, 2019

Re overflow, the child that has class slots will count as number of slots rather than a single class, as in the RFC code. The overflow behavior is to allocate the slots in the end of the type table.

@tqchen
Copy link
Member Author

tqchen commented Oct 20, 2019

#4161

@tqchen
Copy link
Member Author

tqchen commented Oct 24, 2019

Node System Refactor Proposal

This proposal is part of the unified object protocol RFC. The node refers to the base object structure for constructing AST/IR nodes as well as utilities manipulating them in the TVM stack.

Historically, the node folder contains an implementation of a smart pointer(NodePtr), a base object(Node) and a base reference type(NodeRef). On top of these data structures, we provide runtime checking and conversion. These features are now replaced by runtime::Object which allows further reuse by other runtime object types.

Besides the basic features provided by runtime::Object. We also provided additional features for AST/IR nodes. The important ones include:

  • Reflection: being able to quickly access any field of the node into frontends(e.g. python).
  • Serialization: any node objects can be serialized to json.
  • Pretty printing: we can print out most of the node objects.

On one side, we could just put support for these features into runtime(e.g. make them part of Object). However, we need to keep in mind that the runtime needs to be minimum. Additionally, these features are mainly for building compiler infra.

Some of these features are previously scattered in the project, outside the node directory.
We propose to consolidate them into a centralized location -- the node folder.
These features are implemented in a dialect independent manner and serves as base infra for compiler.

Remove vtable from Base Node

The current implementation of node reflection system requires child classes to override a virtual function VisitAttrs. As mentioned in the unified object protocol proposal, one of our design goal is to remove vtable from common AST/IR Node classes. By doing so, these node classes can be directly exposed to other languages via C ABI -- which allows direct access to the fields of the node.

We offer an alternative solution to the c++ vtable by a columnar dispatch table indexed by the type index of the object. The data structure, ReflectionVTable, is a global singleton that keeps record of the key reflection-related functions for each object type. These functions are registered by TVM_REGISTER_NODE_TYPE. Note that we already use a similar data structure for node creation, we propose to unify them together. This change does requires a refactor to remove the final keyword from VisitAttrs functions(as they are no longer virtual).

We do not, however, forbid use of virtual functions for all node sub-classes. We only aim to expose the core IR/AST nodes for now. The sub-classes with virtual functions will still work fine as ususal as long as we do not want to expose them via ABI.

PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants