Update README.md (#656)

tristankincaid · tristankincaid · commit a849f1bda3e5 · 2023-08-03T13:54:41.000+08:00
diff --git a/README.md b/README.md
@@ -17,16 +17,17 @@ Everything runs locally  with no server support and accelerated with local GPUs
 * NVIDIA GPUs via CUDA on Windows and Linux;
 * WebGPU on browsers (through companion project [WebLLM](https://github.com/mlc-ai/web-llm/tree/main)).
 
-**[Click here to join our Discord server!][discord-url]**
-
-**[News] MLC LLM now supports 7B/13B/70B Llama-2 !!**
-
 <ins>**[Check out our instruction page to try out!](https://mlc.ai/mlc-llm/docs/get_started/try_out.html)**</ins>
 
 <p align="center">
   <img src="site/gif/ios-demo.gif" height="700">
 </p>
 
+## News
+
+* [08/02/2023] [Dockerfile](https://github.com/junrushao/llm-perf-bench/) released for CUDA performance benchmarking
+* [07/19/2023] Supports 7B/13B/70B Llama-2
+
 ## What is MLC LLM?
 
 In recent years, there has been remarkable progress in generative artificial intelligence (AI) and large language models (LLMs), which are becoming increasingly prevalent. Thanks to open-source initiatives, it is now possible to develop personal AI assistants using open-sourced models. However, LLMs tend to be resource-intensive and computationally demanding. To create a scalable service, developers may need to rely on powerful clusters and expensive hardware to run model inference. Additionally, deploying LLMs presents several challenges, such as their ever-evolving model innovation, memory constraints, and the need for potential optimization techniques.