Refactoring interview transcripts #104

christophe-lejeune · 2020-03-28T13:44:35Z

The transcript editing form contains an unexplained bug. When editing, speech turns may duplicate or disappear. We have been able to reproduce the bug but, albeit investigating for years, we have been unable to understand its cause. Therefore, the problem was not fixed.

To prevent the problem to occur, the transcript editing form thus need to be refactored. At this stage, I propose to provide interview transcript with an editing form similar to other types of memo (more advanced solutions may be envisioned later). The history of the development of the software has to be put into perspective: the transcript type is the legacy format for the ancient text, inherited from Cassandre first version, when a “corpus” consisted only of “texts” (with no other type of memos). As a consequence, memos currently considered as transcripts are "untyped" (they do not contain a type field) and their field referring to the diary is named corpus.

A new type of memo would be introduced. It would differ from the current "untyped" memos in the following manner:

It is provided with a field type (labeled as interview).
The corpus field is renamed as diary.
As other type of memos, its content is stored in a “body” field (a string, laid out in MarkDown).

This refactoring would follow these steps :

As a transitional measure, new transcripts will be created as field memos (b40ae8e).
All views and lists dealing with types of memo will be adapted to the new type of memo.
When the platform is ready for the new type of transcripts, the button used to create transcript may create new interview memo (instead of field memos).
Lexicometrics features (repeated segments, rare and specific words) will be refactored in order to process the body string instead of the speeches array of objects. As before, lexicometrics features will only be computed for transcripts.
It should be possible to adapt lexicometrics features in order to escape "actors” (at the beginning of each line) from the computation.
Either the editing form or the template used to render the transcript will be provided with a code that converts former "untyped" memo into new "transcript" memos, so that each transcript edition will result in a conversion from older format to the new one. This conversion includes inserting the “transcript” type and converting the speeches field (an array of objects) into a body field (a string, laid out in MarkDown). The code currently used to generate “preview” could be used. To secure errors that may occur during conversions, perhaps that speeches field could be kept in the converted document (but ignored when the document is rendered).

The text was updated successfully, but these errors were encountered:

* Interview memos replace 'legacy' text memos (untyped but known as transcripts). Contrary to transcripts, interviews are shaped as other memos and use the same edit form. * Interviews borrow the transcripts icon. The icon for transcripts becomes light gray. * The button previously creating transcripts now creates interviews. * Lexicometrics indicators are computed for interviews only. * Search and KWICs still works for (previously created) transcripts. * Edit form for transcripts is replaced by a script that converts transcripts into interviews. Any attempt to edit legacy text memos thus triggers the conversion.

christophe-lejeune · 2020-04-07T07:10:24Z

Before integrating this refactoring into master, it should be tested on texts that have been highlighted with LaSuli (prior to conversion). The conversion should not affect LaSuli coordinates.

In particular, these aspects should be examined :

The trailing colon (:) after each actor.
The MarkDown highlight of the actor (**).
The space (\n \n) between speech turns.

The conversion conventions would be adapted if required.

Some users visit `editable_text` URLs thanks to the list of pages previously visited with their browser. This unexpected behaviour triggers the conversion script, sometimes for memos already converted into interview type. In such a case, the resulting memo is deprived of the `diary` field. To avoid this undesired situation, the conversion occurs only if no `body` field is available.

This feature deals with issues #99 and #104

Let's celebrate the 600th commit in Cassandre GitHub repository ! This fixes issues resulting from layout removal (for interview transcripts) operated by commit b156198 * First paragraph of an interview can be coded as a fragment (#116). * Interview memos exclude paragraph indents. This pursues the refactoring of interview transcripts (#104). * Memo content excludes tabular (\t).

christophe-lejeune · 2022-03-07T12:05:30Z

From b156198 , MarkDown layout is disabled. Actors are thus not highlighted anymore.

christophe-lejeune added the Refactoring label Mar 28, 2020

christophe-lejeune added a commit that referenced this issue Apr 20, 2020

FIX: Repeated phrases are computed only for interviews (#104)

09f3124

christophe-lejeune added a commit that referenced this issue Jul 15, 2020

FEATURE: Autosaving a memo when it is edited

ba8d63b

This feature deals with issues #99 and #104

christophe-lejeune closed this as completed Mar 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactoring interview transcripts #104

Refactoring interview transcripts #104

christophe-lejeune commented Mar 28, 2020 •

edited

Loading

christophe-lejeune commented Apr 7, 2020

christophe-lejeune commented Mar 7, 2022

Refactoring interview transcripts #104

Refactoring interview transcripts #104

Comments

christophe-lejeune commented Mar 28, 2020 • edited Loading

christophe-lejeune commented Apr 7, 2020

christophe-lejeune commented Mar 7, 2022

christophe-lejeune commented Mar 28, 2020 •

edited

Loading