Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider inotify integration #39

Open
hoytech opened this issue Aug 8, 2016 · 6 comments
Open

Consider inotify integration #39

hoytech opened this issue Aug 8, 2016 · 6 comments

Comments

@hoytech
Copy link
Owner

hoytech commented Aug 8, 2016

When locking files into memory, we could use inotify (or other interfaces for different OSes) to discover when a file's size or inode changes, and re-map and re-lock the new contents.

See #12 for more discussion on this.

@timp87
Copy link

timp87 commented Jun 8, 2017

+1
For example,
inotify for Linux
kqueue for FreeBSD

@hoytech
Copy link
Owner Author

hoytech commented Jun 8, 2017

It's a shame there's no standardised interface for this. I don't like to have different features supported on different platforms (beyond the virtual memory capabilities provided by the OS of course).

@ghost
Copy link

ghost commented Sep 4, 2018

Hi Doug (@hoytech),

I’m going to add a couple questions here that all seem related to this. If you want me to separate them out into individual issues I can do that.

Q1:

When a file is locked and the file is then changed on the disk what does the OS see? The file’s original contents or the updated contents?

Q2:

What is the overhead in a kill and a restart of vmtouch?

Background:
Inotify does have some bugs where it is inconsistent in kicking off your scripts. If the vmtouch kill/restart overhead is reasonably trivial, the suggestion is to not add another possible point of failure.

I guess we’d need somewhat of a real world example to answer this, which itself brings up a sub-question.

You have better documentation than 90% of all projects out there, but as with everything some really complex examples would help.

Q2-subA: How would you write the vmtouch command(s) to accomplish this?

Goal is to ‘memorize’ just the production websites.

Given the directories:

/var/htdocs/a
/var/htdocs/b
/var/htdocs/c
/var/htdocs/d
/var/htdocs/e

We want to:

  • lock the files
  • include only the production directories ‘a’ and ‘d’
  • exclude all other directories in /var/htdocs
  • include all files named '.htaccess' (For all of the below: In a and d, not b, c, or e.)
  • include all files named 'robots.txt'
  • include the file pattern '*.php'
  • include the file pattern '*.html'
  • include the file pattern '*.js'
  • include the file pattern '*.css'
  • Bonus points if we can also include the '.htaccess' file from '/var/htdocs'

My guess is: [1]

vmtouch -l -d -I '.htaccess' -I 'robots.txt' -I '*.php' -I '*.html' -I '*.js' -I '*.css' /var/htdocs/a /var/htdocs/d

That might need to be 2 commands? One each for /var/htdocs/a and /var/htdocs/d? If so, that brings up another sub-question:

Q2-subB: What do subsequent vmtouch do to the prior vmtouch commands?

Okay, now that we have a possible command(s) to work from is there any real benefit to using Inotify, or something like find . -cmin -5, as opposed to doing like Spotify and just ‘rebuilding’ every 5 to 10 minutes?

Best,
Michael

[1]
Modified from what you wrote on, https://news.ycombinator.com/item?id=13403216

Edit: Added daemon switch.

@hoytech
Copy link
Owner Author

hoytech commented Sep 7, 2018

Hey, sorry for the delay! Good questions.

When a file is locked and the file is then changed on the disk what does the OS see? The file’s original contents or the updated contents?

This depends on how the file is modified. If the file is unlink(2)ed (ie with the rm(1) command) and then recreated then the lock will be on the old file (which is now no longer reachable through the file system) and the new file contents will not be locked at all.

On the other hand, if the file's contents themselves are modified, then the lock will continue to apply to the modified pages of the file.

On the other other hand, if a file is appended to the pages that were present in the file originally will continue to be locked but the new appended pages will not be locked.

What is the overhead in a kill and a restart of vmtouch?

It depends on how big your directory tree is. Usually the overhead is not that high since after restarting vmtouch it just needs to crawl the directory tree again (similar overhead to running find(1) for example) open each file, and mmap each them into its memory space, and then mlock each mapping. This usually doesn't involve much disk access because the files and dentries are mostly cached anyway due to the previous vmtouches.

You have better documentation than 90% of all projects out there

Wow thanks!

Q2-subA: How would you write the vmtouch command(s) to accomplish this?

Your command looks reasonable. I would run it with just -v (and not -l -d) and make sure it is picking up all the files you are interested in and leaving out the ones you aren't. I don't think you would need separate vmtouch runs in this case: it should be able to crawl both of those directories.

Q2-subB: What do subsequent vmtouch do to the prior vmtouch commands?

I'm not sure exactly what you mean by this? If multiple vmtouch commands lock the same files, the file contents will be locked until both vmtouch commands are killed. In other words, multiple locks can exist on the same file contents.

is there any real benefit to using Inotify, or something like find . -cmin -5, as opposed to doing like Spotify and just ‘rebuilding’ every 5 to 10 minutes?

Good question. I feel like there is not a lot of benefit in inotify and co in most server-side use-cases. The overhead should be fairly small, and it isn't really an intrusive operation. If you have a lot of file modifications going on, and are low on memory for page-cache, then it might be beneficial to get the locks added as quickly as possible, but aside from this extreme situation the principle of time locality should ensure that most of the pages are kept in cache until the next vmtouch "rebuild" as you put it.

The one major benefit I would see for inotify would be on power constrained devices like phones. In those cases it might be better to not waste power waking up and rebuilding when nothing has changed in the directory tree.

Hope this helps!

Doug

@ghost
Copy link

ghost commented Sep 7, 2018

Thanks Doug!

To clarify for Q2-subB, I’ll complete the workflow I was thinking would be needed for this use case, taking into account your replies, and make it in the same format your examples page uses. Since this would be run as root, I grabbed an existing root script that handles service restarts, so it has more ‘stuff’ than some will need or want. Not tested, but should be mostly correct ;) [1]

Also, I wrote the below for ease of use, as a few extra milliseconds is no big deal, but tens of seconds would be. Which brings up another couple related ‘theoretical performance’ questions.

  • Will doing a lot of small, or single file, vmtouch commands cause significantly overhead as opposed to putting them all on one vmtouch command?
  • mlock/mlockall being extremely outside my KB, would there be any difference in using -l as opposed to -L for the below? (Well, just for the directories? Or maybe bundling all the single files together?)

I’m sure the answer to these are really case specific, so do you have any general guideline(s) or a rule of thumb we can apply?

Best,
Michael

[1]
It’ll take me a couple months to run through a test server to have something verified and usable on my production servers. I’ll send you a copy of the final file then.

Example 6

Daemonise and lock selected production websites and other desired web server (e.g. Apache) specific files, that are intermixed with non-production websites, into physical memory. Use cron to run the process every 5 minutes while not allowing the script to be run again if it’s still running.

Step one

Use the verbose switch -v (replacing -l -d below) to verify you have selected all files you are interested in.

Step two

Create a vmtouch wrapper script.

$cat /root/bin/vmtouch-wrapper.sh

#!/bin/bash
# # # #
# Script=/root/bin/vmtouch-wrapper.sh
# Author=Michael
# Website=http://www.inet-design.com/
# License=GPLv3
# Last Edit Date : 2018-09-07 15:09
# # # #
# Ref: https://askubuntu.com/questions/764881/apache-startup-script-keeps-lock-file-open
# Many included comments from above script
# # # #
# Cron Entry:
# */5 * * * * /root/bin/vmtouch-wrapper.sh >>/root/log/vmtouch-wrapper.sh.log 2>>/root/log/vmtouch-wrapper.sh.log

# # # #
#  Includes / Static
# # # #
#   Prettify things:
# . "/root/bin/includes/pretty.sh"
# cat of above
INVT="\033[7m"
NORM="\033[0m"
BOLD="\033[1m"
BLINK="\033[5m"
BLACK_F="\033[30m"
BLACK_B="\033[40m"
RED_F="\033[31m"
RED_B="\033[41m"
GREEN_F="\033[32m"
GREEN_B="\033[42m"
YELLOW_F="\033[33m"
YELLOW_B="\033[43m"
BLUE_F="\033[34m"
BLUE_B="\033[44m"
MAGENTA_F="\033[35m"
MAGENTA_B="\033[45m"
CYAN_F="\033[36m"
CYAN_B="\033[46m"
WHITE_F="\033[37m"
WHITE_B="\033[47m"
# # # #
# Be Nice
# Don't contend with the server, but stay above all other niceties
renice -n 1 -p $$ >/dev/null 2>/dev/null
# # # #

# MyUUID=`uuidgen`
# MESGID="$MyUUID vmtouch-wrapper.sh, "
MESGID="vmtouch-wrapper.sh, "

D00=`date "+%Y-%m-%d %H:%M:%S"`
MESG=$GREEN_F$BOLD"Starting"$NORM
echo -n -e $D00" - "$MESGID$MESG

# # # #
# Code for "If service xyz isn't already running, don't start it."
# You probably won't need this with vmtouch?
# # # #
# service xyz status >/dev/null 2>/dev/null
# if [ $? -ne 0 ]; then
#   D00=`date "+%Y-%m-%d %H:%M:%S"`
#   MESG=$RED_F$BOLD"Exiting"$NORM
#   MESGINFO="  - xyz is not running"
#   echo -e "$D00 - $MESGID$MESG$MESGINFO"
#   exit 0
# fi
# # # #

FLOCK_FILE='/root/lock/vmtouch-wrapper.sh.lock'
exec {FLOCK_FD}>"$FLOCK_FILE"
exec {FLOCK_FD}>&-

# Locking
eval "exec $FLOCK_FD>'$FLOCK_FILE'"
if ! flock -n $FLOCK_FD
then
  D00=`date "+%Y-%m-%d %H:%M:%S"`
  MESG=$RED_F$BOLD"File locked"$NORM
  MESGINFO=" - Exiting"
  echo -e "$D00 - $MESGID$MESG$MESGINFO"
  exit 1
fi

(
  # Unlock in sub-shell, so daemons with bad startup scripts
  # (like Apache) don't inherit the look.
  # Note that the lock is still alive in general because it's
  # held by the outer shell.
  eval "exec $FLOCK_FD>-"

  # Among other stuff the mentioned vicious line:
  vmtouchFLAG=0
  D00=`date "+%Y-%m-%d %H:%M:%S"`
  MESG=$CYAN_F$BOLD"Subshell"$NORM
  MESGINFO=" - Killing vmtouch"
  echo -e "$D00 - $MESGID$MESG$MESGINFO"
  /usr/bin/pkill vmtouch

  SLEEPSECS=2
  SLEEPUNTIL=`date --date="$SLEEPSECS seconds" "+%H:%M:%S"`
  D00=`date "+%Y-%m-%d %H:%M:%S"`
  MESG=$CYAN_F$BOLD"Subshell"$NORM
  MESGINFO=" - vmtouch restart  :Sleeping until: "$SLEEPUNTIL
  echo -e "$D00 - $MESGID$MESG$MESGINFO"
  sleep $SLEEPSECS

  #Production directories

  #Watch indent for blank spaces
ProdDirNames=(
/var/htdocs/a
/var/htdocs/d
/var/htdocs/someotherproductiondirectory
)

  for DirName in "${ProdDirNames[@]}"; do 
    vmtouch -l -d \
      -I '.htaccess' \
      -I 'robots.txt' \
      -I '*.php' \
      -I '*.html' \
      -I '*.js' \
      -I '*.css' \
      "$DirName"
    if [ $? -ne 0 ]; then
      vmtouchFLAG=1
    fi
    #Site specific static files, images, etc.
    vmtouch -l -d "$DirName/favicon.ico"
    if [ $? -ne 0 ]; then
      vmtouchFLAG=1
    fi
  done

  #Apache base specific files
  vmtouch -l -d /var/htdocs/.htaccess
  if [ $? -ne 0 ]; then
    vmtouchFLAG=1
  fi
  vmtouch -l -d /var/htdocs/index.html
  if [ $? -ne 0 ]; then
    vmtouchFLAG=1
  fi
  vmtouch -l -d /var/htdocs/robots.txt
  if [ $? -ne 0 ]; then
    vmtouchFLAG=1
  fi

  #Misc / One off files
  vmtouch -l -d /var/htdocs/a/files/bigimageoneverypage.jpg
  if [ $? -ne 0 ]; then
    vmtouchFLAG=1
  fi
  # vmtouch ...

  if [ $vmtouchFLAG -eq 1 ]; then
    D00=`date "+%Y-%m-%d %H:%M:%S"`
    MESG=$CYAN_F$BOLD"Subshell"$NORM
    MESGINFO=" - vmtouch restart had issues"
    echo -e "$D00 - $MESGID$MESG$MESGINFO"
    #Possibly also want to mail an error report here
  else
    D00=`date "+%Y-%m-%d %H:%M:%S"`
    MESG=$CYAN_F$BOLD"Subshell"$NORM
    MESGINFO=" - vmtouch restarted okay"
    echo -e "$D00 - $MESGID$MESG$MESGINFO"
  fi
)

# Unlock in outer shell because we're done.
eval "exec $FLOCK_FD>-"

D00=`date "+%Y-%m-%d %H:%M:%S"`
MESG=$BLUE_F$BOLD"Finished"$NORM
MESGINFO=""
echo -e "$D00 - $MESGID$MESG$MESGINFO"

exit

Step three

Add it to cron.

crontab -e
*/5 * * * * /root/bin/vmtouch-wrapper.sh >>/root/log/vmtouch-wrapper.sh.log 2>>/root/log/vmtouch-wrapper.sh.log

@hoytech
Copy link
Owner Author

hoytech commented Nov 9, 2018

Hey sorry was just going through my backlog and noticed I forgot to reply:

Will doing a lot of small, or single file, vmtouch commands cause significantly overhead as opposed to putting them all on one vmtouch command?

There is a little bit more overhead, but not that much more. As you say it's probably specific to your system so worthwhile benchmarking if you are concerned.

mlock/mlockall being extremely outside my KB, would there be any difference in using -l as opposed to -L for the below? (Well, just for the directories? Or maybe bundling all the single files together?)

There isn't that much difference between vmtouch's use of mlock/mlockall, except that mlockall will lock a little bit more memory, but will make fewer system calls. Again, probably very minor difference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants