Skip to content

Commit 5d7ea14

Browse files
authored
Merge pull request #1236 from vergenzt/gh-1234-include-manpages-in-wheels
feat: make man pages be included in wheels too!
2 parents f73742f + 9209dea commit 5d7ea14

20 files changed

+2488
-29
lines changed

.github/workflows/pypi.yml

+2-3
Original file line numberDiff line numberDiff line change
@@ -10,9 +10,8 @@ jobs:
1010
- uses: actions/setup-python@v5
1111
with:
1212
python-version: '3.10'
13-
- run: pip install --upgrade -r docs/requirements.txt
14-
- run: make -C docs man
15-
- run: pip install --upgrade build
13+
- run: pip install --upgrade build sphinx
14+
- run: sphinx-build -b man docs man
1615
- run: python -m build --sdist --wheel
1716
- name: Publish to TestPyPI
1817
uses: pypa/gh-action-pypi-publish@release/v1

.gitignore

+7-9
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,9 @@
1-
*.pyc
2-
*.swp
31
.DS_Store
4-
build
5-
csvkit.egg-info
6-
reference
7-
dist
2+
*.pyc
83
*.swo
9-
docs/_build
10-
.coverage
11-
cover
4+
*.swp
5+
/build
6+
/dist
7+
/csvkit.egg-info
8+
/docs/_build
9+
/man/.doctrees

CHANGELOG.rst

+1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
Unreleased
22
----------
33

4+
- feat: Add man pages.
45
- fix: :doc:`/scripts/csvstat` no longer errors when a column is a time delta and :code:`--json` is set.
56

67
2.0.0 - May 1, 2024

MANIFEST.in

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
include *.rst
22
include COPYING
33
recursive-include docs *.py
4-
recursive-include docs *.1
54
recursive-include docs *.rst
65
recursive-include docs *.txt
76
recursive-include docs Makefile
87
recursive-include examples *
8+
recursive-include man *.1
99
recursive-include tests *.py
1010
exclude .readthedocs.yaml

csvkit/utilities/csvpy.py

+1-2
Original file line numberDiff line numberDiff line change
@@ -67,8 +67,7 @@ def main(self):
6767

6868
variable = klass(input_file, **kwargs, **self.reader_kwargs)
6969

70-
welcome_message = 'Welcome! "{}" has been loaded in an {} object named "{}".'.format(
71-
filename, class_name, variable_name)
70+
welcome_message = f'Welcome! "{filename}" has been loaded in an {class_name} object named "{variable_name}".'
7271

7372
try:
7473
from IPython.frontend.terminal.embed import InteractiveShellEmbed

man/csvclean.1

+307
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,307 @@
1+
.\" Man page generated from reStructuredText.
2+
.
3+
.
4+
.nr rst2man-indent-level 0
5+
.
6+
.de1 rstReportMargin
7+
\\$1 \\n[an-margin]
8+
level \\n[rst2man-indent-level]
9+
level margin: \\n[rst2man-indent\\n[rst2man-indent-level]]
10+
-
11+
\\n[rst2man-indent0]
12+
\\n[rst2man-indent1]
13+
\\n[rst2man-indent2]
14+
..
15+
.de1 INDENT
16+
.\" .rstReportMargin pre:
17+
. RS \\$1
18+
. nr rst2man-indent\\n[rst2man-indent-level] \\n[an-margin]
19+
. nr rst2man-indent-level +1
20+
.\" .rstReportMargin post:
21+
..
22+
.de UNINDENT
23+
. RE
24+
.\" indent \\n[an-margin]
25+
.\" old: \\n[rst2man-indent\\n[rst2man-indent-level]]
26+
.nr rst2man-indent-level -1
27+
.\" new: \\n[rst2man-indent\\n[rst2man-indent-level]]
28+
.in \\n[rst2man-indent\\n[rst2man-indent-level]]u
29+
..
30+
.TH "CSVCLEAN" "1" "Jul 12, 2024" "2.0.0" "csvkit"
31+
.SH NAME
32+
csvclean \- csvclean Documentation
33+
.SH DESCRIPTION
34+
.sp
35+
Reports and fixes common errors in a CSV file.
36+
.SS Checks
37+
.INDENT 0.0
38+
.IP \(bu 2
39+
Reports rows that have a different number of columns than the header row, if the \fB\-\-length\-mismatch\fP option is set.
40+
.IP \(bu 2
41+
Reports columns that are empty, if the \fB\-\-empty\-columns\fP option is set.
42+
.UNINDENT
43+
.sp
44+
\fBTIP:\fP
45+
.INDENT 0.0
46+
.INDENT 3.5
47+
Enable all checks with \fB\-\-enable\-all\-checks\fP (\fB\-a\fP).
48+
.UNINDENT
49+
.UNINDENT
50+
.SS Fixes
51+
.INDENT 0.0
52+
.IP \(bu 2
53+
If a CSV has unquoted cells that contain line breaks, like:
54+
.INDENT 2.0
55+
.INDENT 3.5
56+
.sp
57+
.nf
58+
.ft C
59+
id,address,country
60+
1,1 Main St
61+
Springfield,US
62+
2,123 Acadia Avenue
63+
London,GB
64+
.ft P
65+
.fi
66+
.UNINDENT
67+
.UNINDENT
68+
.sp
69+
Use \fB\-\-join\-short\-rows\fP to attempt to correct the errors by merging short rows into a single row:
70+
.INDENT 2.0
71+
.INDENT 3.5
72+
.sp
73+
.nf
74+
.ft C
75+
id,address,country
76+
1,\(dq1 Main St
77+
Springfield\(dq,US
78+
2,\(dq123 Acadia Avenue
79+
London\(dq,GB
80+
.ft P
81+
.fi
82+
.UNINDENT
83+
.UNINDENT
84+
.sp
85+
To change the string used to join the lines, use \fB\-\-separator\fP\&. For example, with \fB\-\-separator \(dq, \(dq\fP:
86+
.INDENT 2.0
87+
.INDENT 3.5
88+
.sp
89+
.nf
90+
.ft C
91+
id,address,country
92+
1,\(dq1 Main St, Springfield\(dq,US
93+
2,\(dq123 Acadia Avenue, London\(dq,GB
94+
.ft P
95+
.fi
96+
.UNINDENT
97+
.UNINDENT
98+
.IP \(bu 2
99+
If a CSV has missing delimiters, like:
100+
.INDENT 2.0
101+
.INDENT 3.5
102+
.sp
103+
.nf
104+
.ft C
105+
id,name,country
106+
1,Alice
107+
2,Bob,CA
108+
.ft P
109+
.fi
110+
.UNINDENT
111+
.UNINDENT
112+
.sp
113+
You can add the missing delimiters with \fB\-\-fill\-short\-rows\fP:
114+
.INDENT 2.0
115+
.INDENT 3.5
116+
.sp
117+
.nf
118+
.ft C
119+
id,name,country
120+
1,Alice,
121+
2,Bob,CA
122+
.ft P
123+
.fi
124+
.UNINDENT
125+
.UNINDENT
126+
.sp
127+
\fBTIP:\fP
128+
.INDENT 2.0
129+
.INDENT 3.5
130+
\fI\%csvcut\fP without options also adds missing delimiters!
131+
.UNINDENT
132+
.UNINDENT
133+
.sp
134+
To change the value used to fill short rows, use \fB\-\-fillvalue\fP\&. For example, with \fB\-\-fillvalue \(dqUS\(dq\fP:
135+
.INDENT 2.0
136+
.INDENT 3.5
137+
.sp
138+
.nf
139+
.ft C
140+
id,name,country
141+
1,Alice,US
142+
2,Bob,CA
143+
.ft P
144+
.fi
145+
.UNINDENT
146+
.UNINDENT
147+
.UNINDENT
148+
.sp
149+
\fBSEE ALSO:\fP
150+
.INDENT 0.0
151+
.INDENT 3.5
152+
\fB\-\-header\-normalize\-space\fP under \fI\%Usage\fP\&.
153+
.UNINDENT
154+
.UNINDENT
155+
.sp
156+
\fBNOTE:\fP
157+
.INDENT 0.0
158+
.INDENT 3.5
159+
Every csvkit tool does the following:
160+
.INDENT 0.0
161+
.IP \(bu 2
162+
Removes optional quote characters, unless the \fI\-\-quoting\fP (\fI\-u\fP) option is set to change this behavior
163+
.IP \(bu 2
164+
Changes the field delimiter to a comma, if the input delimiter is set with the \fI\-\-delimiter\fP (\fI\-d\fP) or \fI\-\-tabs\fP (\fI\-t\fP) options
165+
.IP \(bu 2
166+
Changes the record delimiter to a line feed (LF or \fB\en\fP)
167+
.IP \(bu 2
168+
Changes the quote character to a double\-quotation mark, if the character is set with the \fI\-\-quotechar\fP (\fI\-q\fP) option
169+
.IP \(bu 2
170+
Changes the character encoding to UTF\-8, if the input encoding is set with the \fI\-\-encoding\fP (\fI\-e\fP) option
171+
.UNINDENT
172+
.UNINDENT
173+
.UNINDENT
174+
.SS Output
175+
.sp
176+
\fBcsvclean\fP attempts to make the selected fixes. Then:
177+
.INDENT 0.0
178+
.IP \(bu 2
179+
If the \fB\-\-omit\-error\-rows\fP option is set, \fBonly\fP rows that pass the selected checks are written to standard output. If not, \fBall\fP rows are written to standard output.
180+
.IP \(bu 2
181+
If any checks are enabled, \fBerror\fP rows along with line numbers and descriptions are written to standard error. If there are error rows, the exit code is 1.
182+
.UNINDENT
183+
.SS Usage
184+
.INDENT 0.0
185+
.INDENT 3.5
186+
.sp
187+
.nf
188+
.ft C
189+
usage: csvclean [\-h] [\-d DELIMITER] [\-t] [\-q QUOTECHAR] [\-u {0,1,2,3}] [\-b]
190+
[\-p ESCAPECHAR] [\-z FIELD_SIZE_LIMIT] [\-e ENCODING] [\-S] [\-H]
191+
[\-K SKIP_LINES] [\-v] [\-l] [\-\-zero] [\-V]
192+
[FILE]
193+
194+
Fix common errors in a CSV file.
195+
196+
positional arguments:
197+
FILE The CSV file to operate on. If omitted, will accept
198+
input as piped data via STDIN.
199+
200+
optional arguments:
201+
\-h, \-\-help show this help message and exit
202+
\-\-length\-mismatch Report data rows that are shorter or longer than the
203+
header row.
204+
\-\-empty\-columns Report empty columns as errors.
205+
\-a, \-\-enable\-all\-checks
206+
Enable all error reporting.
207+
\-\-omit\-error\-rows Omit data rows that contain errors, from standard
208+
output.
209+
\-\-label LABEL Add a \(dqlabel\(dq column to standard error. Useful in
210+
automated workflows.
211+
\-\-header\-normalize\-space
212+
Strip leading and trailing whitespace and replace
213+
sequences of whitespace characters by a single space
214+
in the header.
215+
\-\-join\-short\-rows Merges short rows into a single row.
216+
\-\-separator SEPARATOR
217+
The string with which to join short rows. Defaults to
218+
a newline.
219+
\-\-fill\-short\-rows Fill short rows with the missing cells.
220+
\-\-fillvalue FILLVALUE
221+
The value with which to fill short rows. Defaults to
222+
none.
223+
.ft P
224+
.fi
225+
.UNINDENT
226+
.UNINDENT
227+
.sp
228+
See also: \fI\%Arguments common to all tools\fP\&.
229+
.SH EXAMPLES
230+
.sp
231+
Test a file with data rows that are shorter and longer than the header row:
232+
.INDENT 0.0
233+
.INDENT 3.5
234+
.sp
235+
.nf
236+
.ft C
237+
$ csvclean examples/bad.csv 2> errors.csv
238+
column_a,column_b,column_c
239+
0,mixed types.... uh oh,17
240+
$ cat errors.csv
241+
line_number,msg,column_a,column_b,column_c
242+
1,\(dqExpected 3 columns, found 4 columns\(dq,1,27,,I\(aqm too long!
243+
2,\(dqExpected 3 columns, found 2 columns\(dq,,I\(aqm too short!
244+
.ft P
245+
.fi
246+
.UNINDENT
247+
.UNINDENT
248+
.sp
249+
\fBNOTE:\fP
250+
.INDENT 0.0
251+
.INDENT 3.5
252+
If any data rows are longer than the header row, you need to add columns manually: for example, by adding one or more delimiters (\fB,\fP) to the end of the header row. \fBcsvclean\fP can\(aqt do this, because it is designed to work with standard input, and correcting an error at the start of the CSV data based on an observation later in the CSV data would require holding all the CSV data in memory – which is not an option for large files.
253+
.UNINDENT
254+
.UNINDENT
255+
.sp
256+
Test a file with empty columns:
257+
.INDENT 0.0
258+
.INDENT 3.5
259+
.sp
260+
.nf
261+
.ft C
262+
$ csvclean \-\-empty\-columns examples/test_empty_columns.csv 2> errors.csv
263+
a,b,c,,
264+
a,,,,
265+
,,c,,
266+
,,,,
267+
$ cat errors.csv
268+
line_number,msg,a,b,c,,
269+
1,\(dqEmpty columns named \(aqb\(aq, \(aq\(aq, \(aq\(aq! Try: csvcut \-C 2,4,5\(dq,,,,,
270+
.ft P
271+
.fi
272+
.UNINDENT
273+
.UNINDENT
274+
.sp
275+
Use \fI\%csvcut\fP to exclude the empty columns:
276+
.INDENT 0.0
277+
.INDENT 3.5
278+
.sp
279+
.nf
280+
.ft C
281+
$ csvcut \-C 2,4,5 examples/test_empty_columns.csv
282+
a,c
283+
a,
284+
,c
285+
,
286+
.ft P
287+
.fi
288+
.UNINDENT
289+
.UNINDENT
290+
.sp
291+
To change the line ending from line feed (LF or \fB\en\fP) to carriage return and line feed (CRLF or \fB\er\en\fP) use:
292+
.INDENT 0.0
293+
.INDENT 3.5
294+
.sp
295+
.nf
296+
.ft C
297+
csvformat \-M $\(aq\er\en\(aq examples/dummy.csv
298+
.ft P
299+
.fi
300+
.UNINDENT
301+
.UNINDENT
302+
.SH AUTHOR
303+
Christopher Groskopf and contributors
304+
.SH COPYRIGHT
305+
2016, Christopher Groskopf and James McKinney
306+
.\" Generated by docutils manpage writer.
307+
.

0 commit comments

Comments
 (0)