API and command line interface for HDFS.
$ hdfscli --alias=dev
Welcome to the interactive HDFS python shell.
The HDFS client is available as `CLIENT`.
In [1]: CLIENT.list('models/')
Out[1]: ['1.json', '2.json']
In [2]: CLIENT.status('models/2.json')
Out[2]: {
'accessTime': 1439743128690,
'blockSize': 134217728,
'childrenNum': 0,
'fileId': 16389,
'group': 'supergroup',
'length': 48,
'modificationTime': 1439743129392,
'owner': 'drwho',
'pathSuffix': '',
'permission': '755',
'replication': 1,
'storagePolicy': 0,
'type': 'FILE'
}
In [3]: with CLIENT.read('models/2.json', encoding='utf-8') as reader:
...: from json import load
...: model = load(reader)
...:
-
Python 3 bindings for the WebHDFS (and HttpFS) API, supporting both secure and insecure clusters.
-
Command line interface to transfer files and start an interactive client shell, with aliases for convenient namenode URL caching.
-
Additional functionality through optional extensions:
avro
, to read and write Avro files directly from HDFS.dataframe
, to load and save Pandas dataframes.kerberos
, to support Kerberos authenticated clusters.
See the documentation to learn more.
$ pip install hdfs
Then hop on over to the quickstart guide. A Conda feedstock is also available.
HdfsCLI is tested against both WebHDFS and HttpFS. There are two ways
of running tests (see scripts/
for helpers to set up a test HDFS cluster):
$ HDFSCLI_TEST_URL=http://localhost:50070 pytest # Using a namenode's URL.
$ HDFSCLI_TEST_ALIAS=dev pytest # Using an alias.
We'd love to hear what you think on the issues page. Pull requests are also most welcome!