Skip to content

Commit

Permalink
📝 Add csv.Sniffer methods
Browse files Browse the repository at this point in the history
  • Loading branch information
veit committed Jan 19, 2025
1 parent f0ca03a commit f1b8b1c
Show file tree
Hide file tree
Showing 2 changed files with 63 additions and 11 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ emergencies when we need to start branches for older versions.
Added
~~~~~

* 📝 Add csv.Sniffer methods
* 📝 Add the removal of git lfs

`24.3.0 <https://github.com/cusyio/Python4DataScience/compare/24.2.0...24.3.0>`_: 2024-11-03
Expand Down
73 changes: 62 additions & 11 deletions docs/data-processing/serialisation-formats/csv/example.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -1478,7 +1478,7 @@
{
"data": {
"text/plain": [
"<pandas.io.parsers.readers.TextFileReader at 0x137d11220>"
"<pandas.io.parsers.readers.TextFileReader at 0x116412300>"
]
},
"execution_count": 16,
Expand Down Expand Up @@ -1746,16 +1746,67 @@
" print(line)"
]
},
{
"cell_type": "markdown",
"id": "0ed726c4-5e09-4676-bcf0-f78e9f7a10e0",
"metadata": {},
"source": [
"[Sniffer.has_header](https://docs.python.org/3/library/csv.html#csv.Sniffer.has_header) analyses your csv file and returns ``True`` if the first row appears to be a series of column headers.\n",
"\n",
"<div class=\"alert alert-block alert-info\">\n",
"\n",
"**Note:**\n",
"\n",
"This method is only a rough heuristic and can produce both false-positive and false-negative results.\n",
"</div>"
]
},
{
"cell_type": "markdown",
"id": "a19c05c1-e947-471b-8089-8e36e65b4268",
"metadata": {},
"source": [
"[Sniffer.sniff](https://docs.python.org/3/library/csv.html#csv.Sniffer.sniff) also analyses your csv file, but returns one of the following dialect subclasses."
]
},
{
"cell_type": "code",
"execution_count": 26,
"id": "263a8cb4-4ae1-46f0-963f-9d2df2de45ed",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['', 'Title', 'Language', 'Authors', 'License', 'Publication date', 'doi']\n",
"['0', 'Python basics', 'en', 'Veit Schiele', 'BSD-3-Clause', '2021-10-28', '']\n",
"['1', 'Jupyter Tutorial', 'en', 'Veit Schiele', 'BSD-3-Clause', '2019-06-27', '']\n",
"['2', 'Jupyter Tutorial', 'de', 'Veit Schiele', 'BSD-3-Clause', '2020-10-26', '']\n",
"['3', 'PyViz Tutorial', 'en', 'Veit Schiele', 'BSD-3-Clause', '2020-04-13', '']\n"
]
}
],
"source": [
"with open('out.csv') as f:\n",
" dialect = csv.Sniffer().sniff(f.read(1024))\n",
" f.seek(0)\n",
" reader = csv.reader(f, dialect)\n",
"\n",
" for line in reader:\n",
" print(line)"
]
},
{
"cell_type": "markdown",
"id": "3bc7ee20",
"metadata": {},
"source": [
"### Dialekte\n",
"### Dialects\n",
"\n",
"csv-Dateien gibt es in vielen verschiedenen Varianten. Das Python csv-Modul kommt bereits mit drei verschiedenen Dialekten:\n",
"csv files are available in many different variants. The Python csv module already comes with three different dialects:\n",
"\n",
"Parameter | excel | excel-tab | unix\n",
"Parameters | [excel](https://docs.python.org/3/library/csv.html#csv.excel) | [excel-tab](https://docs.python.org/3/library/csv.html#csv.excel_tab) | [unix](https://docs.python.org/3/library/csv.html#csv.unix_dialect)\n",
":--- | :--- | :--- | :---\n",
"`delimiter` | `','` | `'\\\\t'` | `','`\n",
"`quotechar` | `'\\\"'` | `'\\\"'` | ` '\\\"'`\n",
Expand All @@ -1780,7 +1831,7 @@
},
{
"cell_type": "code",
"execution_count": 26,
"execution_count": 27,
"id": "c6d73a1e",
"metadata": {},
"outputs": [],
Expand All @@ -1804,7 +1855,7 @@
},
{
"cell_type": "code",
"execution_count": 27,
"execution_count": 28,
"id": "85ac6d66",
"metadata": {},
"outputs": [
Expand Down Expand Up @@ -1837,7 +1888,7 @@
},
{
"cell_type": "code",
"execution_count": 28,
"execution_count": 29,
"id": "341af079",
"metadata": {},
"outputs": [
Expand All @@ -1856,7 +1907,7 @@
" 'doi': ('', '', '', '')}"
]
},
"execution_count": 28,
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
Expand All @@ -1881,7 +1932,7 @@
},
{
"cell_type": "code",
"execution_count": 29,
"execution_count": 30,
"id": "69f3c21a",
"metadata": {},
"outputs": [],
Expand All @@ -1895,7 +1946,7 @@
},
{
"cell_type": "code",
"execution_count": 30,
"execution_count": 31,
"id": "ff5b4f67",
"metadata": {},
"outputs": [
Expand All @@ -1907,7 +1958,7 @@
" '2,Jupyter Tutorial,en,Veit Schiele\\n']"
]
},
"execution_count": 30,
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
Expand Down

0 comments on commit f1b8b1c

Please sign in to comment.