Skip to content

Commit f0bfd89

Browse files
committed
docs: document filters and blobio
1 parent e3d4cfa commit f0bfd89

File tree

5 files changed

+122
-5
lines changed

5 files changed

+122
-5
lines changed

docs/filters.rst

+63
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
**********************************************************************
2+
Filters
3+
**********************************************************************
4+
5+
pygit2 supports defining and registering libgit2 blob filters implemented
6+
in Python.
7+
8+
The Filter type
9+
===============
10+
11+
.. autoclass:: pygit2.Filter
12+
:members:
13+
14+
.. autoclass:: pygit2.FilterSource
15+
16+
Registering filters
17+
===================
18+
19+
.. autofunction:: pygit2.filter_register
20+
.. autofunction:: pygit2.filter_unregister
21+
22+
Example
23+
=======
24+
25+
The following example is a simple Python implementation of a filter which
26+
enforces that blobs are stored with unix LF line-endings in the ODB, and
27+
checked out with line-endings in accordance with the .gitattributes ``eol``
28+
setting.
29+
30+
.. code-block:: python
31+
32+
class CRLFFilter(pygit2.Filter):
33+
attributes = "text eol=*"
34+
35+
def __init__(self):
36+
super().__init__()
37+
self.linesep = b'\r\n' if os.name == 'nt' else b'\n'
38+
self.buffer = io.BytesIO()
39+
40+
def check(self, src, attr_values):
41+
if src.mode == GIT_FILTER_SMUDGE:
42+
# attr_values contains the values of the 'text' and 'eol'
43+
# attributes in that order (as they are defined in
44+
# CRLFFilter.attributes
45+
eol = attr_values[1]
46+
47+
if eol == 'crlf':
48+
self.linesep = b'\r\n'
49+
elif eol = 'lf':
50+
self.linesep = b'\n'
51+
else: # src.mode == GIT_FILTER_CLEAN
52+
# always use LF line-endings when writing to the ODB
53+
self.linesep = b'\n'
54+
55+
def write(data, src, write_next):
56+
# buffer input data in case line-ending sequences span chunk boundaries
57+
self.buffer.write(data)
58+
59+
def close(self, write_next):
60+
# apply line-ending conversion to our buffered input and write all
61+
# of our output data
62+
self.buffer.seek(0)
63+
write_next(self.linesep.join(self.buffer.read().splitlines()))

docs/index.rst

+1
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,7 @@ Table of Contents
6868
config
6969
diff
7070
features
71+
filters
7172
index_file
7273
mailmap
7374
merge

docs/objects.rst

+14
Original file line numberDiff line numberDiff line change
@@ -123,6 +123,20 @@ creating the blob object:
123123
.. autofunction:: pygit2.hash
124124
.. autofunction:: pygit2.hashfile
125125

126+
Streaming blob content
127+
----------------------
128+
129+
`pygit2.Blob.data` and `pygit2.Blob.read_raw()` read the full contents of the
130+
blob into memory and return Python ``bytes``. They also return the raw contents
131+
of the blob, and do not apply any filters which would be applied upon checkout
132+
to the working directory.
133+
134+
Raw and filtered blob data can be accessed as a Python Binary I/O stream
135+
(i.e. a file-like object):
136+
137+
.. autoclass:: pygit2.BlobIO
138+
:members:
139+
126140

127141
Trees
128142
=================

pygit2/blob.py

+21
Original file line numberDiff line numberDiff line change
@@ -100,6 +100,27 @@ class BlobIO(io.BufferedReader, AbstractContextManager):
100100
101101
Supports reading both raw and filtered blob content.
102102
Implements io.BufferedReader.
103+
104+
Example:
105+
106+
>>> with BlobIO(blob) as f:
107+
... while True:
108+
... # Read blob data in 1KB chunks until EOF is reached
109+
... chunk = f.read(1024)
110+
... if not chunk:
111+
... break
112+
113+
By default, `BlobIO` will stream the raw contents of the blob, but it
114+
can also be used to stream filtered content (i.e. to read the content
115+
after applying filters which would be used when checking out the blob
116+
to the working directory).
117+
118+
Example:
119+
120+
>>> with BlobIO(blob, as_path='my_file.ext') as f:
121+
... # Read the filtered content which would be returned upon
122+
... # running 'git checkout -- my_file.txt'
123+
... filtered_data = f.read()
103124
"""
104125

105126
def __init__(

pygit2/filter.py

+23-5
Original file line numberDiff line numberDiff line change
@@ -32,15 +32,19 @@ class Filter:
3232
"""
3333
Base filter class to be used with libgit2 filters.
3434
35+
Inherit from this class and override the `check()`, `write()` and `close()`
36+
methods to define a filter which can then be registered via
37+
`pygit2.filter_register()`.
38+
3539
A new Filter instance will be instantiated for each stream which needs to
3640
be filtered. For each stream, filter methods will be called in this order:
3741
3842
- `check()`
3943
- `write()` (may be called multiple times)
4044
- `close()`
4145
42-
Output data should be written to the next filter in the chain during
43-
`write()` and `close()` via the `write_next` method. All output data
46+
Filtered output data should be written to the next filter in the chain
47+
during `write()` and `close()` via the `write_next` method. All output data
4448
must be written to the next filter before returning from `close()`.
4549
4650
If a filter is dependent on reading the complete input data stream, the
@@ -61,6 +65,13 @@ def check(self, src: FilterSource, attr_values: List[str]):
6165
`check` will be called once per stream.
6266
6367
If `Passthrough` is raised, the filter will not be applied.
68+
69+
Parameters:
70+
src: The source of the filtered blob.
71+
attr_values: The values of each attribute for the blob being
72+
filtered. `attr_values` will be a sorted list containing
73+
attributes in the order they were defined in
74+
``cls.attributes``.
6475
"""
6576

6677
def write(
@@ -74,7 +85,12 @@ def write(
7485
7586
`write()` may be called multiple times per stream.
7687
77-
Output data should be written to `write_next` whenever it is available.
88+
Parameters:
89+
data: Input data.
90+
src: The source of the filtered blob.
91+
write_next: The ``write()`` method of the next filter in the chain.
92+
Filtered output data should be written to `write_next` whenever
93+
it is available.
7894
"""
7995
write_next(data)
8096

@@ -88,6 +104,8 @@ def close(
88104
`close()` will be called once per stream whenever all writes() to this
89105
stream have been completed.
90106
91-
Any remaining output data should be written to `write_next` before
92-
returning.
107+
Parameters:
108+
write_next: The ``write()`` method of the next filter in the chain.
109+
Any remaining filtered output data must be written to
110+
`write_next` before returning.
93111
"""

0 commit comments

Comments
 (0)