-
Notifications
You must be signed in to change notification settings - Fork 0
/
Changelog
287 lines (242 loc) · 14.1 KB
/
Changelog
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
~~~ Version 2.0.1
* Reduced container overhead by 4 files and 2 directories: combined access
and creator, removed version dir and combined all 3 version files into one,
combined openhosts and meta dir
* Fix bug in bitmap in ad_plfs_open where some compilers don't correctly
do bit's. Now we do bitmaps ourselves instead of relying on the compiler.
~~~ Version 2.0
* Got the distributed metadata hashing working. Removed map from plfsrc
~~~ Version 1.1.8
* Fixed the getattr bug (we were just getting the attr from the openfile
handle and not getting it from the Container as well). This made the size
reasonable but the permissions/ownership/everthing else were garbage).
This also required a bit of smarts. This was really slowing things down
in lanl's fs_test program which does a file stat before the close and this
was making rank 0 descend the entire container structure and build an index
in order to figure out the size. Now we added a lazy flag and ad_plfs_fcntl
passes the lazy flag and it gets the size from the open file handle (this
won't be perfectly accurate but it will show incremental progress and won't
be expensive to query).
* fixed ADIO parse conf error. Now a good error is reported when the
conf file is bad or the file path is bad.
* added global_summary_dir to the plfsrc
* fixed tools/plfs_flatten_index which broke due to the stop_buffering
thing we put into the Index to stop buffering when memory is exhausted.
the buffering is so everyone retains their write index in memory to do
the flatten on close in ADIO
~~~ Version 1.1.7
Fixed a bug in rename.
~~~ Version 1.1.6
Added support for a statfs override in the plfsrc file
in response to ticket 35609.
~~~ Version 1.1.5
Bug fix for symbolic links. I swear this is the second time I fixed this bug
~~~ Version 1.1.4
Bug fix in the multiple mount point parsing (unitialized string pointer)
~~~ Version 1.1.3
Added the multiple mount point parsing in plfsrc
~~~ Version 1.1.2
Index flattening in ADIO close for write
~~~ Version 1.1.1
Index broadcast in ADIO open for read
~~~ Version 1.0.0.0pre.0.C
No new features or performance engineering. Just bug fixes.
All the bug fixes we've made since the last release:
31207 unlink non-atomic right now resolved plfs adamm 0
[email protected] 2 weeks ago 8 days ago 0
- We now rename the container to a hidden path before unlinking since
the recursive unlink over a container is not atomic.
31236 reproducible rename bug resolved plfs Nobody 0
[email protected] 2 weeks ago 2 weeks ago 0
- Simple bug in rename. PLFS has to remove a container in the destination
spot before doing a rename since the rename system call can clobber an
existing file but not an existing directory. So if the user wants to rename
over an existing PLFS file, we have to remove the container first. We had
a bug where we were removing the src instead of the dest.
31253 plfsrc improvement resolved plfs Nobody 0
[email protected] 2 weeks ago 8 days ago 0
- Make it so that if there is no map defined in the plfsrc file, it
defaults to:
map MNT:BACK
31346 Open files on rename resolved plfs johnbent 0
[email protected] 2 weeks ago 2 weeks ago 8 days ago 0
- The problem was that the rename was creating an open file in the open_files
cache with the relative name but the open_files cache should be for absolute
names. Then when the remove file was asking the open_files cache to remove it,
it wasn't found (i.e. absolute != relative). So now rename correctly inserts
the absolute path into the open_files cache.
31348 symlink not working resolved plfs Nobody 0
[email protected] 2 weeks ago 2 weeks ago 0
- We now handle sym links correctly. We just stash the logical front-end path
into the backend symlink. Then it just works. On the plfs_getattr though, we
now have to do two stats: one of the entry to see if it's a symlink, ENONENT,
a file, or a directory. If it's a directory, we then have to check for the
access file.
31367 weird bug with relative paths in f_symlink (FUSE problem?) resolved plfs Nobody 0
[email protected] 2 weeks ago 9 days ago 0
- This was nothing. I had forgotten that symlinks can actually be relative.
31375 can't remove a read-only file resolved plfs Nobody 0
[email protected] 13 days ago 9 days ago 0
- fixed this by making dirMode() always make files owner-writable because
a file can only be deleted in the parent dir is writable. this means we
can only remove a container if the top-level dir is writable. this
means that a user can't protect file from themselves writing it. we can
live with that.
31421 svn co times out resolved plfs Nobody 0
[email protected] 9 days ago 40 hours ago 0
- This was a problem with O_RDWR files. We started creating the index on opens
for reads to check for permissions problems. However, for O_RDONLY, we can
cache this index, but not for O_RDWR. We were however incorrectly caching it
for O_RDWR and this was causing problems when reads wouldn't get newly written
data.
31424 weird issue with using an executable made in a plfs mountpoint on centos resolved plfs samuel 0
[email protected] 9 days ago 9 days ago 8 days ago 0
- Something weird where we built on centos:
We build plfs itself in a plfs mount point. We then try to run that
plfs executable and get this error:
cannot restore segment prot after reloc: Permission denied
we then do this:
/usr/sbin/setenforce 0
And it works. I think this is not a big deal, we know a workaround, and
it might actually work better on CentOS since we fixed some other bugs that
were causing build issues on other distributions. So this might be fixed now
and if not, there's an easy workaround.
31427 can't turn off debugging info resolved plfs johnbent 0
[email protected] 9 days ago 8 days ago 0
- Silly issues with setting DEFINES in the configuration code.
31587 umask breaking our 777 dropping thing resolved plfs Nobody 0
[email protected] 3 days ago 3 days ago 0
- We need to set droppings to 777 so that they can also be modified. Otherwise,
some other user can write to a file and create droppings and then the original
owner can't remove the file. But the umask was sometimes getting applied as
well and the droppings weren't always 777. So now we set the umask to 0 in
WriteFile::openFile (and then restore it).
31616 truncate bug resolved plfs Nobody 0
[email protected] 3 days ago 40 hours ago 0
- We haven't seen this in a long time. We think it got magically fixed by some
other bug fixes.
31639 rmdir error seen on cuda resolved plfs Nobody 0
[email protected] 2 days ago 47 hours ago 0
- This was strange and annoying since it was old code that had never had
problems before on other systems. But then something about FUSE on cuda was
different.The problem was that on opendir, we were doing the actual readdir and
caching it. Then on readdir, we'd return the cached dirents. But what was
happening was:
opendir
readdir from offset 0 to end
unlink entry
readdir from offset 0 to end
stat entry
ENOENT
So now we remove the dirents after we do a complete readdir to end. Then if a
new readdir comes in, we refetch the dirents. Therefore, we get the newly
revised set of dirents following the unlink so we don't return a deleted entry
on the readdir.
31656 permission error on open resolved plfs Nobody 0
[email protected] 2 days ago 22 hours ago 0
- This was a long weird one but the resolution was that Gary had code that
was calling open with O_CREAT but no mode and it was resulting in very weird
permissions. This is fine and that's what happens on other file systems is that
the file gets created with weird permission bits. But PLFS was failing to
create the file bec it was putting the weird permission bits on the container
and was then unable to create the droppings. We just weren't adding S_IRUSR in
the dirMode() call.
31704 rename of an open file causes segfault on next read of .plfsdebug resolved plfs Nobody 0
[email protected] 42 hours ago 40 hours ago 0
- This was related to the other rename of an open file. The rename was
removing the plfs fd but not removing the entry in the open_files cache. Then
when we iterated down it, we dereferenced a dangling pointer. The fix was
making sure that we keep the set of open files consistent with the open_files
cache. We screwed this up by inserting a relative path and then trying to
remove an absolute one.
31720 can't build plfs in plfs [email protected] 40 hours ago 14 hours ago 0
- This is fixed now. The problem was that truncate wasn't cleaning up meta
droppings so the stat was wrong and reads that pre-allocated a buffer
according to the stat were finding garbage in the buffer. It works now for
truncates to 0 bec it just unlinks all the metadata droppings. However, I
wonder if on this case, we should probably set the mtime of the access file.
It doesn't yet work for truncates to the middle of the file. For these, we have
to selectively unlink/modify metadata droppings.
~~~ Version 1.0pre.0
1) Problems with building the PLFS SC09 paper. Problems are in data/summary
directory. When make does ./get_data > data, it was failing due to a bad
reference count because we were incrementing the writers when a child came
in and used the parent's fd. Then when the file was closed, it asserted since
the reference count was still 1. So now we don't increment the writers for
children but now we're still seeing this other strange behavior:
jvpn-132-65:/tmp/plfs_mnt/SC09/data/summary>./get_data >! data
cat: stdout: Result too large
2) plfs_read is now multi-threaded when a logical read spans multiple physical
data chunks. In addition, if these chunks need to be opened that is done in
the threads as well so we parallelize both the opens and the reads
3) Container::populateIndex is now multi-threaded when the container has
more than one index file. This means of course that overlapped writes are
now non-deterministically handled in addition to being undefined as they
were previously.
4) Moved the mapping from a logical file to a physical path into the plfs
library and out of fuse. The mapping consults a plfsrc file in order to
know how to distribute users across multiple backends.
~~~ Version 0.1.6
1) Gary found a bug in the old version running on cuda.lanl.gov. I tried to
reproduce it but found it was no longer in 0.1.5. The test was just doing
a bunch of dd's with increasing offsets and then reading from another node.
Just sure what was causing that bug. Didn't go back and debug old versions.
But while investigating that bug, I found another bug that was caused when
dd would truncate a file and the index file trunctated would be truncated
at the first entry and then would be empty. But I was assuming that it
wasn't empty so I was checking the entry before the truncate point to see
if it needed to be partially truncated. But when there was no entry, then
this was tromping memory.
~~~ Version 0.1.5
1) Fixed a bug where temp dirs used to attempt to create the container were
not being deleted when some other node created the container first. The
problem was that we changed from SUID to accessfile to identify a container
and we weren't removing the accessfile in the temporary container before
trying to unlink the temporary container.
2) Added a container/version/$(TAG_VERSION) dropping so in the future we can
make sure to be backward compatible.
~~~ Version 0.1.5
1) Changed the behavior of truncate to not remove droppings but just to truncate
them. This was needed because we discovered that if one proc already has open
handles to droppings, another can do a truncate, and then those droppings
disappear when the handles are closed. This was discovered using the qio
tests which don't do barriers after open and closes.
2) Fixed reference counting bugs in the Plfs_fd structure. Again this was
discovered due to the lack of barriers in the qio tests. When we had
simultaneous readers and writers in a Plfs_fd structure, some of the closes
were screwing up the reference counting, and other thread's Plfs_fd structures
were being removed out from under them.
3) cvs co was causing problems because it was renaming open files. I thought
I had fixed that before but it cropped up again. Now in the rename, when we
are renaming an open file, we remove the old open file, we change it's
internal names, and we add it back again as a new open file. There's a bunch
of comments in plfs_fuse.cpp:f_rename about this as well.
4) Fixed a bug that was caused by layering PLFS's (e.g. running PLFS-MPI onto
a PLFS-MNT) which was causing this problem:
a) the app creates file 'foo'
b) PLFS-MPI does mkdir 'foo' on PLFS-MNT
c) PLFS-MNT does mkdir 'foo' on PLFS-BACK
d) PLFS-MPI touches 'foo/access' on PLFS-MNT
e) PLFS-MNT does mkdir 'foo/access' on PLFS-BACK
f) PLFS-MPI tries to access directory 'foo'
g) PLFS-MNT thinks 'foo' is a container since it has an access entry
h) PLFS-MPI gets confused bec what was a directory is now a file
Also, this revealed a bug where PLFS library thought that mmap returned NULL
on error, but really it returns MAP_FAIL, so PLFS library didn't notice when
mmap failed and this caused a segfault. To fix this, the check for the access
file checks both existence and whether it is a file. Now when the PLFS-MPI
does a touch of 'foo/access,' the PLFS-MNT makes a 'foo/access' directory, so
when PLFS-MPI looks at 'foo,' PLFS-MNT doesn't find a 'foo/access' file, so it
treats 'foo' as a directory which is the proper behavior in this confusing
layered case. This also revealed the worse problem that a user can never
create an 'access' entry in a directory because PLFS will start treating that
directory as a container and when it doesn't have the other entries that PLFS
expects in a container, it gets all weird. So to fix this, we have broken
backwards compatibility by changing the name of the access file from 'access'
to '.plfsaccess081173' At some point, we need to add backward compatibility
checking.
5) Fixed it so that the plfs_map in the trace/ directory is correctly built
again. I tried to change it so that it builds from the library only but it
access the low-level index calls so that doesn't work. So now it builds by
just using all the src/*.o files instead of just trying to link with the
library alone.