-
Notifications
You must be signed in to change notification settings - Fork 591
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
📝Translating docs to Simplified Chinese (#2705)
* 📝Translating docs to Simplified Chinese * update files * 📝Translating docs to Simplified Chinese * 📝Translating docs to Simplified Chinese * 📝Translating docs to Simplified Chinese * update files * 📝Translating docs to Simplified Chinese * 📝Translating docs to Simplified Chinese * update files * translate 'hf_file_system.md' * update files
- Loading branch information
1 parent
ca3f674
commit 6be2b3e
Showing
3 changed files
with
253 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,119 @@ | ||
<!--⚠️ 请注意,此文件为 Markdown 格式,但包含我们文档生成器的特定语法(类似于 MDX),可能无法在您的 Markdown 查看器中正确渲染。 | ||
--> | ||
|
||
# 通过文件系统 API 与 Hub 交互 | ||
|
||
除了 [`HfApi`],`huggingface_hub` 库还提供了 [`HfFileSystem`],这是一个符合 [fsspec](https://filesystem-spec.readthedocs.io/en/latest/) 规范的 Python 文件接口,用于与 Hugging Face Hub 交互。[`HfFileSystem`] 基于 [`HfApi`] 构建,提供了典型的文件系统操作,如 `cp`、`mv`、`ls`、`du`、`glob`、`get_file` 和 `put_file`。 | ||
|
||
<Tip warning={true}> | ||
|
||
[`HfFileSystem`] 提供了 fsspec 兼容性,这对于需要它的库(例如,直接使用 `pandas` 读取 Hugging Face 数据集)非常有用。然而,由于这种兼容性层,会引入额外的开销。为了更好的性能和可靠性,建议尽可能使用 [`HfApi`] 方法。 | ||
|
||
|
||
</Tip> | ||
|
||
## 使用方法 | ||
|
||
```python | ||
>>> from huggingface_hub import HfFileSystem | ||
>>> fs = HfFileSystem() | ||
|
||
>>> # 列出目录中的所有文件 | ||
>>> fs.ls("datasets/my-username/my-dataset-repo/data", detail=False) | ||
['datasets/my-username/my-dataset-repo/data/train.csv', 'datasets/my-username/my-dataset-repo/data/test.csv'] | ||
|
||
>>> # 列出仓库中的所有 ".csv" 文件 | ||
>>> fs.glob("datasets/my-username/my-dataset-repo/**/*.csv") | ||
['datasets/my-username/my-dataset-repo/data/train.csv', 'datasets/my-username/my-dataset-repo/data/test.csv'] | ||
|
||
>>> # 读取远程文件 | ||
>>> with fs.open("datasets/my-username/my-dataset-repo/data/train.csv", "r") as f: | ||
... train_data = f.readlines() | ||
|
||
>>> # 远程文件内容读取为字符串 | ||
>>> train_data = fs.read_text("datasets/my-username/my-dataset-repo/data/train.csv", revision="dev") | ||
|
||
>>> # 写入远程文件 | ||
>>> with fs.open("datasets/my-username/my-dataset-repo/data/validation.csv", "w") as f: | ||
... f.write("text,label") | ||
... f.write("Fantastic movie!,good") | ||
``` | ||
|
||
可以传递可选的 `revision` 参数,以从特定提交(如分支、标签名或提交哈希)运行操作。 | ||
|
||
与 Python 内置的 `open` 不同,`fsspec` 的 `open` 默认是二进制模式 `"rb"`。这意味着您必须明确设置模式为 `"r"` 以读取文本模式,或 `"w"` 以写入文本模式。目前不支持追加到文件(模式 `"a"` 和 `"ab"`) | ||
|
||
## 集成 | ||
|
||
[`HfFileSystem`] 可以与任何集成了 `fsspec` 的库一起使用,前提是 URL 遵循以下格式: | ||
|
||
``` | ||
hf://[<repo_type_prefix>]<repo_id>[@<revision>]/<path/in/repo> | ||
``` | ||
|
||
<div class="flex justify-center"> | ||
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/huggingface_hub/hf_urls.png"/> | ||
</div> | ||
|
||
对于数据集,`repo_type_prefix` 为 `datasets/`,对于Space,`repo_type_prefix`为 `spaces/`,模型不需要在 URL 中使用这样的前缀。 | ||
|
||
以下是一些 [`HfFileSystem`] 简化与 Hub 交互的有趣集成: | ||
|
||
* 从 Hub 仓库读取/写入 [Pandas](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#reading-writing-remote-files) DataFrame : | ||
|
||
```python | ||
>>> import pandas as pd | ||
|
||
>>> # 将远程 CSV 文件读取到 DataFrame | ||
>>> df = pd.read_csv("hf://datasets/my-username/my-dataset-repo/train.csv") | ||
|
||
>>> # 将 DataFrame 写入远程 CSV 文件 | ||
>>> df.to_csv("hf://datasets/my-username/my-dataset-repo/test.csv") | ||
``` | ||
|
||
同样的工作流程也适用于 [Dask](https://docs.dask.org/en/stable/how-to/connect-to-remote-data.html) 和 [Polars](https://pola-rs.github.io/polars/py-polars/html/reference/io.html) DataFrames. | ||
|
||
* 使用 [DuckDB](https://duckdb.org/docs/guides/python/filesystems) 查询(远程)Hub文件: | ||
|
||
```python | ||
>>> from huggingface_hub import HfFileSystem | ||
>>> import duckdb | ||
|
||
>>> fs = HfFileSystem() | ||
>>> duckdb.register_filesystem(fs) | ||
>>> # 查询远程文件并将结果返回为 DataFrame | ||
>>> fs_query_file = "hf://datasets/my-username/my-dataset-repo/data_dir/data.parquet" | ||
>>> df = duckdb.query(f"SELECT * FROM '{fs_query_file}' LIMIT 10").df() | ||
``` | ||
|
||
* 使用 [Zarr](https://zarr.readthedocs.io/en/stable/tutorial.html#io-with-fsspec) 将 Hub 作为数组存储: | ||
|
||
```python | ||
>>> import numpy as np | ||
>>> import zarr | ||
|
||
>>> embeddings = np.random.randn(50000, 1000).astype("float32") | ||
|
||
>>> # 将数组写入仓库 | ||
>>> with zarr.open_group("hf://my-username/my-model-repo/array-store", mode="w") as root: | ||
... foo = root.create_group("embeddings") | ||
... foobar = foo.zeros('experiment_0', shape=(50000, 1000), chunks=(10000, 1000), dtype='f4') | ||
... foobar[:] = embeddings | ||
|
||
>>> # 从仓库读取数组 | ||
>>> with zarr.open_group("hf://my-username/my-model-repo/array-store", mode="r") as root: | ||
... first_row = root["embeddings/experiment_0"][0] | ||
``` | ||
|
||
## 认证 | ||
|
||
在许多情况下,您必须登录 Hugging Face 账户才能与 Hub 交互。请参阅文档的[认证](../quick-start#authentication) 部分,了解有关 Hub 上认证方法的更多信息。 | ||
|
||
也可以通过将您的 token 作为参数传递给 [`HfFileSystem`] 以编程方式登录: | ||
|
||
```python | ||
>>> from huggingface_hub import HfFileSystem | ||
>>> fs = HfFileSystem(token=token) | ||
``` | ||
|
||
如果您以这种方式登录,请注意在共享源代码时不要意外泄露令牌! |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,130 @@ | ||
<!--⚠️ 请注意,此文件为 Markdown 格式,但包含我们文档生成器的特定语法(类似于 MDX),可能无法在您的 Markdown 查看器中正确渲染。 | ||
--> | ||
|
||
# 操作指南 | ||
|
||
在本节中,您将找到帮助您实现特定目标的实用指南。 | ||
查看这些指南,了解如何使用 huggingface_hub 解决实际问题: | ||
|
||
<div class="mt-10"> | ||
<div class="w-full flex flex-col space-y-4 md:space-y-0 md:grid md:grid-cols-3 md:gap-y-4 md:gap-x-5"> | ||
|
||
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" | ||
href="./repository"> | ||
<div class="w-full text-center bg-gradient-to-br from-indigo-400 to-indigo-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed"> | ||
仓库 | ||
</div><p class="text-gray-700"> | ||
如何在 Hub 上创建仓库?如何配置它?如何与之交互? | ||
</p> | ||
</a> | ||
|
||
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" | ||
href="./download"> | ||
<div class="w-full text-center bg-gradient-to-br from-indigo-400 to-indigo-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed"> | ||
下载文件 | ||
</div><p class="text-gray-700"> | ||
如何从 Hub 下载文件?如何下载仓库? | ||
</p> | ||
</a> | ||
|
||
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" | ||
href="./upload"> | ||
<div class="w-full text-center bg-gradient-to-br from-indigo-400 to-indigo-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed"> | ||
上传文件 | ||
</div><p class="text-gray-700"> | ||
如何上传文件或文件夹?如何对 Hub 上的现有仓库进行更改? | ||
</p> | ||
</a> | ||
|
||
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" | ||
href="./search"> | ||
<div class="w-full text-center bg-gradient-to-br from-indigo-400 to-indigo-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed"> | ||
搜索 | ||
</div><p class="text-gray-700"> | ||
如何高效地搜索超过 200k+ 个公共模型、数据集和Space? | ||
</p> | ||
</a> | ||
|
||
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" | ||
href="./hf_file_system"> | ||
<div class="w-full text-center bg-gradient-to-br from-indigo-400 to-indigo-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed"> | ||
HfFileSystem | ||
</div><p class="text-gray-700"> | ||
如何通过一个模仿 Python 文件接口的便捷接口与 Hub 交互? | ||
</p> | ||
</a> | ||
|
||
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" | ||
href="./inference"> | ||
<div class="w-full text-center bg-gradient-to-br from-indigo-400 to-indigo-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed"> | ||
推理 | ||
</div><p class="text-gray-700"> | ||
如何使用加速推理 API 进行预测? | ||
</p> | ||
</a> | ||
|
||
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" | ||
href="./community"> | ||
<div class="w-full text-center bg-gradient-to-br from-indigo-400 to-indigo-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed"> | ||
社区 | ||
</div><p class="text-gray-700"> | ||
如何与社区(讨论和拉取请求)互动? | ||
</p> | ||
</a> | ||
|
||
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" | ||
href="./collections"> | ||
<div class="w-full text-center bg-gradient-to-br from-indigo-400 to-indigo-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed"> | ||
集合 | ||
</div><p class="text-gray-700"> | ||
如何以编程方式构建集合? | ||
</p> | ||
</a> | ||
|
||
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" | ||
href="./manage-cache"> | ||
<div class="w-full text-center bg-gradient-to-br from-indigo-400 to-indigo-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed"> | ||
缓存 | ||
</div><p class="text-gray-700"> | ||
缓存系统如何工作?如何从中受益? | ||
</p> | ||
</a> | ||
|
||
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" | ||
href="./model-cards"> | ||
<div class="w-full text-center bg-gradient-to-br from-indigo-400 to-indigo-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed"> | ||
模型卡片 | ||
</div><p class="text-gray-700"> | ||
如何创建和分享模型卡片? | ||
</p> | ||
</a> | ||
|
||
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" | ||
href="./manage-spaces"> | ||
<div class="w-full text-center bg-gradient-to-br from-indigo-400 to-indigo-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed"> | ||
管理您的Space | ||
</div><p class="text-gray-700"> | ||
如何管理您的Space的硬件和配置? | ||
</p> | ||
</a> | ||
|
||
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" | ||
href="./integrations"> | ||
<div class="w-full text-center bg-gradient-to-br from-indigo-400 to-indigo-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed"> | ||
集成库 | ||
</div><p class="text-gray-700"> | ||
将库集成到 Hub 中意味着什么?如何实现? | ||
</p> | ||
</a> | ||
|
||
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" | ||
href="./webhooks_server"> | ||
<div class="w-full text-center bg-gradient-to-br from-indigo-400 to-indigo-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed"> | ||
Webhooks 服务器 | ||
</div><p class="text-gray-700"> | ||
如何创建一个接收 Webhooks 的服务器并将其部署为一个Space? | ||
</p> | ||
</a> | ||
|
||
</div> | ||
</div> |