Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring interview transcripts #104

Closed
5 of 6 tasks
christophe-lejeune opened this issue Mar 28, 2020 · 2 comments
Closed
5 of 6 tasks

Refactoring interview transcripts #104

christophe-lejeune opened this issue Mar 28, 2020 · 2 comments

Comments

@christophe-lejeune
Copy link
Member

christophe-lejeune commented Mar 28, 2020

The transcript editing form contains an unexplained bug. When editing, speech turns may duplicate or disappear. We have been able to reproduce the bug but, albeit investigating for years, we have been unable to understand its cause. Therefore, the problem was not fixed.

To prevent the problem to occur, the transcript editing form thus need to be refactored. At this stage, I propose to provide interview transcript with an editing form similar to other types of memo (more advanced solutions may be envisioned later). The history of the development of the software has to be put into perspective: the transcript type is the legacy format for the ancient text, inherited from Cassandre first version, when a “corpus” consisted only of “texts” (with no other type of memos). As a consequence, memos currently considered as transcripts are "untyped" (they do not contain a type field) and their field referring to the diary is named corpus.

A new type of memo would be introduced. It would differ from the current "untyped" memos in the following manner:

  • It is provided with a field type (labeled as interview).
  • The corpus field is renamed as diary.
  • As other type of memos, its content is stored in a “body” field (a string, laid out in MarkDown).

This refactoring would follow these steps :

  • As a transitional measure, new transcripts will be created as field memos (b40ae8e).
  • All views and lists dealing with types of memo will be adapted to the new type of memo.
  • When the platform is ready for the new type of transcripts, the button used to create transcript may create new interview memo (instead of field memos).
  • Lexicometrics features (repeated segments, rare and specific words) will be refactored in order to process the body string instead of the speeches array of objects. As before, lexicometrics features will only be computed for transcripts.
  • It should be possible to adapt lexicometrics features in order to escape "actors” (at the beginning of each line) from the computation.
  • Either the editing form or the template used to render the transcript will be provided with a code that converts former "untyped" memo into new "transcript" memos, so that each transcript edition will result in a conversion from older format to the new one. This conversion includes inserting the “transcript” type and converting the speeches field (an array of objects) into a body field (a string, laid out in MarkDown). The code currently used to generate “preview” could be used. To secure errors that may occur during conversions, perhaps that speeches field could be kept in the converted document (but ignored when the document is rendered).
christophe-lejeune added a commit that referenced this issue Apr 5, 2020
* Interview memos replace 'legacy' text memos (untyped but known as transcripts). Contrary to transcripts, interviews are shaped as other memos and use the same edit form.
* Interviews borrow the transcripts icon. The icon for transcripts becomes light gray.
* The button previously creating transcripts now creates interviews.
* Lexicometrics indicators are computed for interviews only.
* Search and KWICs still works for (previously created) transcripts.
* Edit form for transcripts is replaced by a script that converts transcripts into interviews. Any attempt to edit legacy text memos thus triggers the conversion.
@christophe-lejeune
Copy link
Member Author

Before integrating this refactoring into master, it should be tested on texts that have been highlighted with LaSuli (prior to conversion). The conversion should not affect LaSuli coordinates.

In particular, these aspects should be examined :

  • The trailing colon (:) after each actor.
  • The MarkDown highlight of the actor (**).
  • The space (\n \n) between speech turns.

The conversion conventions would be adapted if required.

christophe-lejeune added a commit that referenced this issue Apr 21, 2020
Some users visit `editable_text` URLs thanks to the list of pages previously visited with their browser. This unexpected behaviour triggers the conversion script, sometimes for memos already converted into interview type. In such a case, the resulting memo is deprived of the `diary` field. To avoid this undesired situation, the conversion occurs only if no `body` field is available.
christophe-lejeune added a commit that referenced this issue Jul 15, 2020
This feature deals with issues #99 and #104
christophe-lejeune added a commit that referenced this issue Nov 10, 2021
Let's celebrate the 600th commit in Cassandre GitHub repository !

This fixes issues resulting from layout removal (for interview transcripts) operated by commit b156198

* First paragraph of an interview can be coded as a fragment (#116).
* Interview memos exclude paragraph indents. This pursues the refactoring of interview transcripts (#104).
* Memo content excludes tabular (\t).
@christophe-lejeune
Copy link
Member Author

From b156198 , MarkDown layout is disabled. Actors are thus not highlighted anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant