Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] How to run alpa-serve? #100

Open
SeungsuBaek opened this issue Sep 9, 2023 · 3 comments
Open

[Question] How to run alpa-serve? #100

SeungsuBaek opened this issue Sep 9, 2023 · 3 comments

Comments

@SeungsuBaek
Copy link

Hi.

I am interested in your nice work.

I want to get a parallel configuration for my server.

I read your codes but it is hard to find some documents or steps for Alpa-serve (not Alpa).

Can you give some advice to run alpa-serve system on a server?
(To get a parallel configuration, how to use alpa-serve?)

I already installed pre-requisite package (ray and other python packages).

@DonMiller9294
Copy link

Certainly. Here's a summarized paragraph:

Setting up Alpa-Serve for parallel processing on your server involves several key steps. Begin by configuring Alpa-Serve using provided or customized configuration files. Launch Alpa-Serve as a server process, considering the need for multiple instances to handle parallel tasks efficiently. Integrate Alpa-Serve with Ray, a prerequisite package, to enable distributed task execution. Develop a job distribution strategy, scripting the submission of tasks to Alpa-Serve with specified parallelism levels. Implement monitoring, scaling, error handling, and fault tolerance mechanisms, ensuring the system's reliability. Rigorous testing and benchmarking are essential before deployment, and comprehensive documentation will facilitate maintenance and troubleshooting. Seek support from the Alpa-Serve community if specific issues arise during setup, tailoring your configuration to meet your unique server requirements.

@SeungsuBaek
Copy link
Author

Thanks for your answer.

Now, i can get a model placement using alpa-serve simulator. and run it.

But, i have more question about source code.

def replica_placement_fast_greedy(init_sol: ModelPlacement,
model_datas: List[ModelData],
cluster_env: ClusterEnv,
workload: Workload,
evaluator: PlacementEvaluator,
verbose: int):
"""Use a fast greedy heuristic to place replicas on groups."""
tic = time.time()
if evaluator is None:
evaluator = PlacementEvaluator(model_datas, cluster_env, workload,
"fast_simulator", False)
# Load constants
num_models = len(model_datas)
num_groups = len(init_sol.group_configs)
mem_budget = cluster_env.mem_budget
group_configs = init_sol.group_configs
weight_mem = {} # Dict[parallel_config -> [model_idx -> weight_mem]]
for parallel_config in init_sol.group_configs:
weight_mem[parallel_config] = [
max(x.profiling_result.para_dict[parallel_config].weight_mem)
if parallel_config in x.profiling_result.para_dict
else inf
for x in model_datas]
# Greedy placement
sol = init_sol
it = 0
while True:
stats = evaluator.get_stats([sol])[0]
overall_goodput, goodputs, group_num_requests, fullstats = stats
# Find the most unserved model and the most available group
model_num_unserved = [
(s.num_requests * (1 - goodput))
for s, goodput in zip(fullstats.per_model_stats, goodputs)]
#model_num_unserved = [
# (x.rate * (1 - goodput))
# for x, goodput in zip(model_datas, goodputs)]
model_ids = np.argsort(model_num_unserved)[::-1]
group_ids = np.argsort(group_num_requests)
group_mem = [
sum(weight_mem[c][m_id] for m_id in group_ms)
for c, group_ms in zip(sol.group_configs, sol.group_models)
]
found = False
for g_id in group_ids:
c = sol.group_configs[g_id]
for m_id in model_ids:
if (m_id not in sol.group_models[g_id] and
weight_mem[c][m_id] + group_mem[g_id] <= mem_budget):
found = True
break
if found:
break
if not found:
break
sol = sol.add_model(g_id, m_id).normalize()
if verbose >= 2:
print(f"iter: {it}, score: {overall_goodput:.4f}, "
f"elapsed: {time.time() - tic:.2f}, "
f"best placement: {sol}, ")
it += 1
return sol

  1. What is meaning of mem_budget? is it cluster(groups)'s maximum memory? or single group maximum memory?

  2. If i use bert-6.7B, the weight_mem of bert-6.7B shows 6.7GB. I think that the weight_mem of bert-6.7B is about 13GB like in your paper. Can you explain this calculation of memory constraints?

Thanks to read my questions.

@LiYuTingxxn
Copy link

Could you please provide a detailed explanation of how to run AIpaserve?
Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants