notes

3.1.4: 2010-xx-xx

  IMAP QUOTA works.

  Smarthost SIZE is forwarded to SMTP submit clients.

  Several bugs and one missing feature:

  1. INTHREAD doesn't work now. Not sure why.

  2. INTHREAD OLDER and similar searches don't work; the date-based
     searches look at the outer message.

  3. THREAD needs retesting. It may be broken.

  4. COMPRESS now reimplemented, and may work better with iphone 4.

  IMAP::emitResponses still needs a tweak or two.

  And the feature: automatic flag-based views. Almost everything is
  ready.

  SORT(SUBJECT) is broken. The only way to fix it is to add a postgres
  function to implement the moronic baseness algorithm in the RFC. I can't
  write that sort of thing.


HelperRowCreator

  If that were to use a Cache, message injection would be speeded up
  significantly.


Open Bug: aox add view broken

  Ron Peterson reports that aox add view breaks.

  aox add view /users/auser/someone /users/auser/archive auser address someone@gmail.com
  aox: Can't create view named /users/auser/someone

  No idea what's wrong (yet).

  1b149d fixed the immediate view creation problem, but the view doesn't
  work yet. -- AMS 20100413

  Views now work, but threaded views still don't.

  Notify: Ron Peterson <rpeterso@mtholyoke.edu>
  Message-Id: <20100405132205.GL27702@mtholyoke.edu>


Plaintext passwords work, sort of

  allow-plaintext-passwords almost works. But if we get a cleartext
  password and that's disallowed, we ought to treat it as bad, and
  notify the admin that a change of passwords is necessary.

  Send email to Timo when that works, he suggested the feature
  (unknowingly).


Open bug format blah

  I envision the format thus:

  "Open Bug: " as headline prefix to mark it for the processor.

  The rest of the headline must never change; it'll be used to look up
  the web page. This is basically the bug ID. IDs can be reused as
  soon as the bug is closed, though.

  The body text contains plain old text, which will be exported, and
  which may be changed at will.

  Lines matching "Notify: " specify an email address. The email
  address is not published on the web site.

  When a commit affects a note, the script sends a notification to all
  specified addresses. It doesn't include the patch though. When a
  commit removes a note, the script says tells people "fixed".

  We could try to set the References field on responses... but I don't
  care to. It would be neat, but I don't care enough to do it. Or
  maybe I do. If there's a Message-ID field, we use that in
  References, otherwise we skip it?

  Is this good?


P/W: Stuff we may want to keep

  List::take()->remove() was good

  also keeping iterators working

  Scope

  Having EH make a new Log by default

  Simplifying the way Connection objects are added to the main loop

  List::append( List ) -> appendList()

  smtpclient has simpler logic... but don't break what works

  cancelQuery rewrite. not sure.

  Do (some of) these, one at a time, making sure nothing breaks after
  each one. None soon.


A web page about spam filtering

  Containing neat queries, such as "tell me whether the user with
  address x has mail from y not tagged as spam", or "tell me whether
  user x has sent mail to domain y".

  With aox that can be done, uniquely. So mention it.


Functions and views

  Views that'll be good for people:
  - Valid local email addresses and users
  - Sender information
    - How many earlier messages from that address
    - How many messages to that address
    - How many earlier messages to/from the domain

  Functions:
  - Create user
  - Delete user
  - Rename user
  - Change password
  - Change password, given old and new password, checking
  - Enable/disable alias
  - Add/delete/rename alias


Installer doesn't take steps to ensure that the installation is usable

  It could do at least two things:

  Run all the same checks on the new installation as 'aox check' and
  archiveopteryx at startup.

  Try to connect to all the server addresses and if anything's
  listening anywhere, mention it on stdout.

  In addition to this, the installer does the wrong thing right now if
  it creates the database users and then fails to run psql to load the
  schema. It exits with an error, which means the randomly-generated
  passwords are lost, because the configuration file is not written.


db-address=localhost works, but needs improvement.

  - A new connect(addr, port) function resolves the given address and
    creates one SerialConnector object for each result. It starts the
    first one, which (after an error, or a delay of 1s) initiates the
    next connection in line and so on. The first one to connect swaps
    out the d of the original connection with its own, and makes the
    EventLoop act as though it had just connected.

  It works, but the code is a little ugly. The error handling logic
  needs a careful look after some time. Once that's solid, the other
  callers (SpoolManager/SmtpClient etc.) can be converted.


aox check schema

  This command would check several things.

  a) that dbuser has the needed rights
  b) that all the right tables are there, and all the right columns,
     with the right types, and no unexpected constraints
  c) that all the right indexes are there
  d) that dbowner owns everything
  i) that inserts that would duplicate a constraint are properly
     recognised

  As a bonus, perhaps it could list some unexpected/unknown deviations:
  e) locally added tables
  f) locally added columns
  g) locally added indices
  h) missing constraints

  Change 43939 and following move towards this: the idea is to
  introduce new functions e.g. Schema::checkIntegrity (in addition to
  checkRevision) and Schema::grantPrivileges, that can be used both by
  aox check schema/aox grant privileges/whatever, and also by the
  installer (instead of lib/grant-privileges, and instead of the
  half-hearted checking it does now). the server is essentially
  unaffected, it just uses Database::checkSchema/checkAccess for a
  quick check.

  this sounds ok, but it's ugly because Schema::execute is completely
  given to upgrading the schema, and neither can nor should be
  repurposed to do other things besides. so that means more static
  functions in Schema and separate EventHandlers to do the
  checking/granting/whatevering. but that's okay.


Cleartext passwords

  We help migrating away from cleartext/plaintext passwords:

  1. We also store SCRAM and similar secrets in the DB (secrets which
     aren't password equivalents)
  2. We extend the users table with two new columns, 'last time
     cleartext was needed' and 'number of successful authentications
     without cleartext password usage since cleartext'.
  3. If a client uses SCRAM, we increment the counter.
  4. If a client uses CRAM or PLAIN, we reset counter and set the time
     to today.
  5. We provide some helping code to delete passwords for users with a
     high count and a long-ago time.
  6. We add documentation saying that if you disable auth-this and
     auth-that, you can disable store-plaintext-passwords.
  7. We add configuration/db sanity checks for ditto.


Database schema range

  People occasionally need to access the db with an old version of
  mailstore. I suggest that we:

  a) add a 'writable_from' column specifying the oldest version that
     can write to the database.
  b) add a 'readable_from' column specifying the oldest version that
     can read the database

  aox upgrade schema would update writable_from to the oldest schema
  version for which a writer would do the right job. This would often
  change when a table changes, but not when a table is added.

  readable_from would be the oldest revision that can read the database.

  When the server starts up, it would check:

  - am I >writable_from? If so, mailboxes can be read-write
  - else, am I >readable_from? If so, startup can proceed, but all
    mailboxes are read-only. lmtp, smtp and smtp-submit do not start.
  - else, quit.

  And in order to handle database updates, I suggest another table,
  'features', with a single string column. When aox update database
  fixes something, it inserts a row into features. A modern database
  would have two rows in this table, 'numbered address fields' and 'no
  nulls in bodyparts'.


Bounces and DSNs

  Mail is currently fairly reliable. There is one big exception:
  Bounces aren't 100% parsable. But generally, if you work hard, you
  can know whether a message was delivered or not, and mostly they are
  delivered.

  So we benefit from converting the most common nonstandard bounces to
  DSNs, and then treat them as DSNs.

  For nonstandard bounces (like those of qmail) we identify the
  message by trying hard, do some hacky parsing, use the bounce
  (excluding trailing message) as first part of the DSN multipart,
  cook up a new DSN report based on the parsing, and save
  text/822-headers as a third part.

  Then, searches that tie bounces together with messages sent work
  even better.

  (Another trick we can/should use is to see whether the host we
  deliver to seems to be the final destination based on earlier
  (answered) messages.)


imapd/handlers/acl.cpp

  Different tasks, some shared code, same file. Separate this out into
  different classes inheriting something. Then add the right sort of
  logging statement to the end of parse().


Full-text search

  There Be Problems.

  The code now assumes that the IMAP client searches for one or more
  words, rather than an arbitrary substring. Postgres uses word
  segmentation.

  If postgres were to use e.g. overlapping three-letter languageless
  substrings, we would do what IMAP wants. sounds senseless.

  We also have a requirement to stem search arguments less.
  Specifically, a search for ARM7TDMI should not return messages about
  the ARM6 or about my left arm.


Convert more parsers to use AbnfParser

  There are still a few places where we roll our own messy parsers and
  suffer for it (e.g. HTTP, DigestMD5). We know they work, but making
  them use AbnfParser in a spare moment would be an act of kindness.


aox/conf/tls-certificate

  Those variables are not well described. We need a bit more.

  Also, -secret is probably misnamed, we use -password for other
  cases. I expect that's why aox show cf tls-certificate-secret yields
  while e.g. aox show cf db-password does not.


Autoresponder

  We have vacation now, but it isn't quite right for autoresponses.
  Sieve autorespond should be like this:

  1. :quote should quote the first text/plain part if all of the
     following are true:

     1. The message is signed, and the signature verified (using any
        supported signature mechanism, DKIM SHOULD be supported).
     2. The first text/plain part does not have a Content-Disposition
        other than inline.

     If any of the conditions aren't true, :quote shouldn't quote.

     If there's a signature block, :quote shouldn't quote that.

     If the quoted text would be more than ten lines, :quote may crop
     it down as much as it wants, ideally by skipping lines starting
     with '>', otherwise by removing the last lines.

  2. :subject, :from and :addresses as for vacation.

  3. :cc can be used to send a copy to the specified From address.

  4. The default :handle should not be based on the quoted text.

  5. Two text arguments, one for text before the quoted text, one for
     text after the quoted text.

  6. The autoresponse goes to the envelope sender, as some RFC
     requires. So we want an option to skip the response unless the
     return-path matches reply-to (if present) or From (unless
     reply-to is present).


Message arrival tag

  Once annotate is done, we want a tag, ie. a magic annotation which
  stays glued to the message wherever it goes, even after copy/move.

  We also want a way to store the original RFC822 format somewhere
  inside and/or outside the database, indexed by the arrival tag
  identifier. It's good if the tag is split, so we can have "x-y"
  where X is the CD/DVD number and Y is the file on the CD/DVD. Or
  something like that.


Sieve ihave

  There are three holes in our ihave rules.

  Single-child anyof doesn't promote the ihave:

    if anyof( ihave "foo" ) {
        foo; # errors should not be reported here
    }

  Not doesn't promote:

    if not not ihave "foo" {
        foo; # errors should not be reported here
    }

  Finally, if/elsif always applies the ihave to its own block, instead
  of walking along elsif/else to find the block that might be executed
  if ihave returns true:

    if not ihave "foo" {
        # errors should be reported here
    } else {
        foo; # but not here
    }


C/R

  C/R sucks. But it has its uses, so we can benefit from implementing
  it somehow. Here are some classes of messages we may want to treat
  specially:

  - replies to own mail
  - messages in languages not understood by the user
  - mail from previously unknown addresses
  - mail from freemail providers
  - vacation responses from unknowns
  - messages likely, but not certain to be out-of-office-autoreply
  - dkim/mass-signed messages (if verified)

  The questions are: How can we ensure that we almost never challenge
  real mail, while simultaneously challenging most/all messages that
  don't come from valid senders? How can we provide suitable
  configuration?

  Mail from freemail vendors tends to have a "Received: ... via HTT"
  field.


Using rrdtool

  What could we want to graph with rrdtool? Lots.

  - CPU seconds used
  - database size
  - messages in the db
  - average response time
  - 95th percentile response time
  - messages per user
  - message size per user
  - average query execution time
  - average query queue size

  More?

  http://jwatt.org/svg/authoring/ is interesting for generating graphs
  via the web interface.


We should be able to use a read-only local database mirror.

  That way, we can play nicely with most replication systems.

  The way to do it: add a new db-mirror setting pointing to a
  read-only database mirror. all queries that update are sent to
  db-address, all selects are sent to db-mirror. db-mirror defaults to
  db-address.


We should test multipart/signed and multipart/encrypted support.

  We must add a selection of RFC 1847 messages to canonical, and make
  sure they survive the round trip. No doubt there will be bugs.


Per-user client certificates

  We could store zero or more client certificates (or fingerprints, or
  whatever) per user. When a user has logged in, we'd check whether
  that user has a non-zero list of certificates, and if so, we'd do a
  TLS renegotiation, this time demanding a client certificate. If the
  client certificate matches, we allow access, otherwise we don't (and
  we alert the user).

  A bit difficult to do with the hands-off tlsproxy.


We should store bodyparts.text for PDF/DOC.

  We need non-GPLed code to convert PDF and DOC to plaintext.

  Or maybe we need a generic interface to talk to plugins.


Switch to using named constraints everywhere.


Default c-t-e of PGP signatures

  Right now we give them binary. q-p or 7bit would be better, I think.

  What other application/* types are really text?

  From a conversation the other day: we could avoid base64 encoding an
  entity whose content-type is not text if it contains only printable
  ASCII. I don't know if it's worth doing, though.

  The problem with doing that is that it treats sequences of CR LF, CR
  and LF as equivalent. An application/foobar object that happens to
  contain only CR, LF and printable ASCII can be broken.


Recognising spam

  The good spam filters now all seem to require local training with
  both spam and nonspam corpora. We can do clever stuff... sometimes.

  Instead of filtering at delivery, we can filter when a message
  becomes \recent. When we increase first_recent, we hand each new
  message to the categoriser, and set $Spam or $Nonspam based on its
  answer.

  This lets the categoriser use all the information that's available
  right up to the moment the user looks at his mail.

  We can also build corpora for training easily. All messages to which
  users have replied are nonspam, replies to messages from local users
  are nonspam, messages in certain folders are spam, messages with a
  certain flag are spam.

  We can connect to a local server to ask whether a message is spam.
  They seem to work that way, but with n different protocols.


Faster mapping from unicode to 8-bit encodings

  At the moment, we use a while loop to find the right codepoint in an
  array[256]. Mapping U+00EF to latin-1 requires looping from 0 to
  0xEF, checking those 239 entries.

  We could use a DAG of partial mappings to make it faster. Much
  faster. Mapping U+20AC to 8895-15 would require just one lookup: In
  the first partial table for 8859-15. Mapping U+0065 to 8859-15 would
  require three: In the first (U+20AC, one entry long), in the
  fallback (U+00A0, 96 entries long) and in the last (U+0000, 160
  entries long).

  Effectively, 8859-15 would be a first table of exceptions and then
  fall back to 8859-1.

  The tables could be built automatically, compiled in, and would be
  tested by our existing apparatus.

  Or we could do it simpler and perhaps even faster: Make a local
  array from unicode to target at the start, fill it in as we go, and
  do the slow scan only when we see a codepoint for the first time.


Multipart/signed automatic processing

  We could check signatures automatically on delivery, and reject bad
  signed messages.

  The big benefit is that some forgeries are rejected, even though the
  reader and the reading MUA doesn't do anything different.

  The disadvantage is that we (probably?) can't verify all signatures,
  which gives a false sense of security for the undetectable forgeries.

  In case of PKCS7, it's possible to self-sign. Those we cannot
  check. In that case we remove the signature entirely from the MIME
  structure, so it doesn't look checked to the end-user.

  PGP cannot be checked, except it sort of can. We can have a small
  default keyring including the heise.de CA key and so on, and treat
  that as root CAs, using the keyservers to dig up intermediate keys.


PGP automatic processing

  Apparently there are five different PGP wrapping formats. We could
  detect four and transform them to the proper MIME format.


Plugins

  It's not given that we want to accept all mail. If we don't, who
  makes the decision? A sieve script may, and refuse/reject mail it
  does not like. And a little bit of pluginnery may. I think we'd do
  well to support the postfix plugin protocol, so all postfix policy
  servers can work with aox. (All? Or just half? Doesn't postfix have
  two types of policy plugins?)

  We may even support site-wide and group-wide sieve scripts and
  permit a sieve script to invoke the plugin. A sieve statement like
  this?

     UsePolicyServer localhost 10023 ;


BURL

  If the message is multipart and the boundary occurs in a part, that
  part needs encoding. Or else switch to a different body.


Delaying seen-flag setting

  We can move the seenflagsetter to imapsession, build up flags to
  set, flush the write cache before fetch flags, store, state-altering
  commands and searches which use either modseq or flags.

  This ought to cut down the number of transactions issued per imap
  command nicely.


Sending forged From despite check-sender-addresses

  vacation :from and notify :mailto :from don't check

  The injector probably needs to get the logic from the smtp server.


Per-group and systemwide sieves

  People always seem to want such things. It'll be easy to implement.
  Most of the tricky issues are described in
  http://tools.ietf.org/html/draft-degener-sieve-multiscript-00


The Sieve "header" test may fail

  Write a test or three that feeds the thing a 2047-encoded header
  field and checks that it's correctly matched/not matched. Then make
  it pass.


The subaddress specification says foo@ != foo+@ wrt. :detail

  The former causes any :detail tests to evaluate to false, while the
  latter treats :detail as an empty string. We treat both as an empty
  string.

  (We could set detail to a single null byte, to \0\r\0\n\0, to a
  sequence of private-use unicode characters, or even to
  Entropy::string( 8 ) if there is no separator. The chance of that
  appearing in an address is negligible.)


SMTP extensions

  Here are the ones we still don't implement, but ought to implement
  at some point:

  DELIVERBY (RFC 2852): At some time.

  http://www.iana.org/assignments/mail-parameters

  DELIVERBY has the funny little characteristic that we can support it
  with great ease iff the smarthost does, so we ought to advertise iff
  if the smarthost does.


The groups and group_members tables seem a little underused

  We do not use them at all. We meant to use them for "advanced" ACL
  support, but nobody ever asked, and it didn't seem worthwhile.

  I now think it's worthwhile.

  Here's what I want to add:

  Make a superusers group, which members can authenticate as anyone,
  and the notion of group admins, who can authenticate as other
  members of the group.

  Or maybe an administrator table, linking a user to either a group or
  to null. If a group, then the admin can authenticate as other
  members of that group and (importantly) has 'a' right on their
  mailboxes, if null, then ditto for all groups.

  Extend Permissions to link against group_members when selecting
  applicable permissions.

  Make groups be permissible ACL identifiers.


We need to be able to disable users

  - Reject mail with 5xx/4xx.
  - Prevent login.
  - 1+2.
  - a group admin can enable/disable group members
  - a superadmin can enable/disable anyone
  - a group admin cannot unblock an overall blockage


Dynamically preparing often-used queries

  We can prepare queries cleverly.

  Inside Query, at submit time, we first check whether a Query's text
  matches a PreparedStatement, and uses it if so.

  If not, we check whether the query looks preparable. The condition
  seems to be simple: Starts with 'select ' and contains no numbers.
  If it's preparable we add it to a cache, which is discarded at GC
  time.

  If a preparable query is used more than n times before the cache is
  discarded, we prepare the query and keep the PreparedStatement
  around.


Defending against PGP Desktop and similar

  There are several more things to do:

  - Guess that it's repeating a query for smallish UID sets and do the
    query once and for all.

  - Defend against 'OR BODY asdf BODY asd' by recognising in
    simplify() that when asd matches, asdf always will. Added bonus
    for the base64 shit.

  - Hack in Search::parse() for that/those specific search keys, and
    setting up a more sensible selector.

  The first two make sense IMO.


New RFCs

  5463: sieve ihave
  5490: sieve metadata

  5442: lemonade profile-bis


Sieve notify

  5435: sieve notify
  5436: sieve notify mailto
  5437: sieve notify xmpp

  mailto: combined with :from is unchecked


Bug confusing U+ED00 and U+0000 in the message cache

  When we write to the database, U+0000 (which occasionally occurs,
  mostly by mistake but sometimes on purpose) is transformed to
  U+ED00, and when we read it, back.

  So if U+ED00 is written to the DB, it comes back as U+0000.

  This means that Archiveopteryx works differently depending on
  whether the cache is used or not. That has to be resolved somehow.


Axel, /Mime problem

  The problem is that the VCF file contains literal NUL bytes, but is
  sent with a text/* MIME type, and we're mangling the NULs during
  charset conversion (or so I guess, given that they become '?'s
  instead).


Various alias-related feature requests

  e.g. Benjamin wants empty localparts, a number of people want multiple
  targets (Axel, Ingo).


Axel, /Unable to fetch 12MB mail

  Some sort of loop in the fetcher? I didn't look.


Problems found in 3.0.0 by Timo

  - SEARCH SENTON/SENTBEFORE/SENTAFTER have some bugs.
    (Not verified because of segfault; will check later.)

    Not fixed; I lean towards fixing it if it's the only thing
    imaptest complains about.


Caching search results

  If a selector is !dynamic(), its results can be cached until the next
  modseq change on the mailbox.


Bugs

  - "aox start" doesn't complain when there's a schema mismatch error.


Gmail compatibility

  https://developers.google.com/gmail/imap_extensions is easy, but
  requires https://support.google.com/mail/answer/7190?hl=en, which is
  simple but tiresome. Oh no.

  Steps:

  - Implement special-use mailboxes. Will need trash, inbox, allmail,
    but don't need all of them right away.

  - A trigger on mailbox_messages insertion:
      if the mailbox has an owner:
        look up the owner's allmail and trash mailboxes
        if this is allmail:
          insert the message into those of the owner's other mailboxes
          that contain >=1 message with the same messages.thread_root
        else if this is trash:
          remove from allmail (perhaps?)
        else:
          make sure the message is also in the owner's allmail
          also insert all other messages that are in the owner's allmail
          and have the same messages.thread_root.

    This trigger ensures that mailboxes tend to contain complete
    threads, that all mail contains everything except trash, and that
    delivery just magically picks all the right mailboxes.

  - Make expunge move things to trash, with a time, and delete
    messages after 30 days. Probably best done as another trigger.
    Expunge in allmail expunges everywhere except trash.

  - X-GM-blah except the RAW search key

  - X-GM-blah RAW search key. work.