Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Releasing version 3.05 #715

Closed
amitdo opened this issue Feb 10, 2017 · 24 comments
Closed

Releasing version 3.05 #715

amitdo opened this issue Feb 10, 2017 · 24 comments

Comments

@amitdo
Copy link
Collaborator

amitdo commented Feb 10, 2017

From #707:

@amitdo commented:

I also think we should release a last 3.0x version in the upcoming 2-6 weeks.

@zdenop commented:

If 3.05 should be the last version with legacy OCR Engine (old engine) then there should be possibility to read OCR result from memory.

@jbreiden commented:

https://wiki.ubuntu.com/ZestyZapus/ReleaseSchedule

Feb 16 is the final deadline for changes to Ubuntu 17.04. I am not comfortable shipping anything from 4.x to these users, but we can consider taking a snapshot of the 3.0.5 branch. It does have some bug and compatibility fixes that are good for users. Regarding training data, I would not ship an update that at all. This would be purely be a code update.

I know the long standing issue has been restoring an API call (last seen in version 3.0.2) to send results to memory instead of file. I respect that idea, but we don't have it, and it's not that easy to add. I think it is fair to say that it would be impossible before deadline. So the question is, do we ship an update to users this cycle or not. And if so, should I take a snapshot? And if so, what would it be called?

A few more thoughts that are somewhat related

  • I see no reason that this has to be the last ever release on the 3.0.x branch.
  • My guess is by the next next release in Oct 2017 that 4.x will be ready for the vast majority of users
  • I'm not planning to ship both 3.0.x and 4.x at the same time with Debian/Ubuntu. I think it will be very rare for people to want both, and those who do will be advanced users who can work from source code.

@Shreeshrii commented:

@jbreiden Good idea to do a code update for 3.05 for Ubuntu 17.04. There are a number of bug fixes and changes and it would be good to get them out to the users. Thanks!

@amitdo
Copy link
Collaborator Author

amitdo commented Feb 10, 2017

I added Release Notes for 3.05, based on the 3.05 branch.

@amitdo
Copy link
Collaborator Author

amitdo commented Feb 14, 2017

@zdenop, @egorpugin

Some minor fixes to 3.05:

README.md

The latest stable version is 3.04.01, released in February 2016.

=>

The latest stable version is 3.05, released in February 2017.

tesseract/api/baseapi.h
#define TESSERACT_VERSION_STR "3.05.00dev"
=>
#define TESSERACT_VERSION_STR "3.05.00"

configure.ac
AC_INIT([tesseract], [3.05.00dev], [https://github.com/tesseract-ocr/tesseract/issues])
=>
AC_INIT([tesseract], [3.05.00], [https://github.com/tesseract-ocr/tesseract/issues])

PACKAGE_YEAR=2015
PACKAGE_DATE="07/11"

=>

PACKAGE_YEAR=2017
PACKAGE_DATE="02/14"

ChangeLog

See ReleaseNotes

AUTHORS
Please add Nick White to the Community Contributors.

CONTRIBUTING.md
bf9f40cac631

@egorpugin
Copy link
Contributor

You could create a PR with these changes. :)

@jbreiden
Copy link
Contributor

Thinking about his, I have never had an easy time doing a new Tesseract release for Debian. Maybe shipping a new version right at the cutoff date is not such a smart idea.

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=794489
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=816857
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=815056
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=815860
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=815970

@amitdo
Copy link
Collaborator Author

amitdo commented Feb 14, 2017

So, when?

@amitdo
Copy link
Collaborator Author

amitdo commented Feb 14, 2017

Maybe you want to use this strategy:

  1. Make one unified deb package for Debian 9, Ubuntu 16.04, Ubuntu 17.04.
  2. Push it to Debian & Ubuntu 'backports' repos. For Debian 9 and Ubuntu 17.04 you will need to wait until they are released. For Ubuntu 16.04 you can start when you want to.

@Shreeshrii
Copy link
Collaborator

Shreeshrii commented Feb 16, 2017

@zdenop

I think, you should tag the 3.05 branch with the 3.05.0 release. It can be updated in case there are any additional changes.

@zdenop
Copy link
Contributor

zdenop commented Feb 16, 2017

@zdenop zdenop closed this as completed Feb 16, 2017
@jbreiden jbreiden reopened this Feb 18, 2017
@jbreiden
Copy link
Contributor

jbreiden commented Feb 18, 2017

Being super cautious, I am comparing symbols from libtesseract.so.3.0.4 and libtesseract.so.3.0.5. Here are symbols that disappeared in 3.0.5.

-_Z14WriteParamDescP8_IO_FILEtP10PARAM_DESC@Base
-_Z9read_listPKc@Base
-_ZN16GENERIC_2D_ARRAYIN9tesseract17TrainingSampleSet13FontClassInfoEE6ResizeEiiRKS2_@Base
-_ZN8WERD_RES19FakeWordFromRatingsEv@Base
-_ZN9tesseract11ObjectCacheINS_4DawgEEC1Ev@Base
-_ZN9tesseract11ObjectCacheINS_4DawgEEC2Ev@Base
-_ZN9tesseract13DocumentCache13LoadDocumentsERK13GenericVectorI6STRINGEPKcPFbRKS2_PS1_IcEE@Base
-_ZN9tesseract13DocumentCache15GetPageBySerialEi@Base
-_ZN9tesseract17ViterbiStateEntryD1Ev@Base
-_ZN9tesseract17ViterbiStateEntryD2Ev@Base
-_ZN9tesseract18DawgPositionVectorD1Ev@Base
-_ZN9tesseract18DawgPositionVectorD2Ev@Base
-_ZN9tesseract4Dict4LoadEPNS_9DawgCacheE@Base
-_ZNK9tesseract4Dict19ProcessPatternEdgesEPKNS_4DawgERKNS_12DawgPositionEibPNS_18DawgPositionVectorEP12PermuterType@Base
-_ZNK9tesseract9ImageData8PreScaleEiPP3PixPiS4_P13GenericVectorI4TBOXE@Base

3.0.4.symbols.amd64.txt
3.0.5.symbols.amd64.txt

@jbreiden
Copy link
Contributor

jbreiden commented Feb 18, 2017

Now I want to test against gimagereader, but can't from my chroot jail. It fails on both 3.0.4 and 3.0.5

$ /usr/bin/gimagereader-gtk
terminate called after throwing an instance of 'std::logic_error'
  what():  basic_string::_M_construct null not valid
Aborted (core dumped)

@jbreiden
Copy link
Contributor

jbreiden commented Feb 18, 2017

This is the release candidate for 3.0.5 for Debian. I'm going to do a compatibility check with the gimagereader package maintainer. Others are also very welcome to test if they are set up for it.

tesseract-3.0.5-candidate.zip

@jbreiden
Copy link
Contributor

jbreiden commented Feb 18, 2017

EDITED

Okay, I got gimagereader working for 3.0.4 in the Debian Sid chroot jail.

https://help.ubuntu.com/community/BasicChroot#Accessing_graphical_applications_inside_the_chroot

However, it crashes during recognition on 3.0.5. This is automatic stop-ship with respect to Debian.

crashlog.txt

@Shreeshrii
Copy link
Collaborator

Shreeshrii commented Feb 18, 2017

@manisandro

Have you tried building gImageReader with 3.05 release candidate?

manisandro/gImageReader#156

@jbreiden
Copy link
Contributor

I'd like to figure things out such that gimagereader doesn't need to be rebuilt.

@innir
Copy link
Contributor

innir commented Feb 19, 2017

Hi,
I'm the maintainer of gimagereader in Debian. I just checked with the debs @jbreiden provided. Gimagereader crashes when compiled against 3.0.4 but run with 3.0.5, see https://gist.github.com/innir/a4662ad7043c9cc27e9f7bdaff8f8acf.

Recompiling gImageReader against 3.0.5 resolves the crash.

This is really easy to fix upstream. You just have to bump the SONAME if your symbols change! Not too hard to get! Please make a 3.0.5.1 release and bump the SONAME!

Best,
Philip

@innir
Copy link
Contributor

innir commented Feb 19, 2017

@manisandro no problem on your side, gImageReader works fine with 3.0.5 :)

@manisandro
Copy link

Thanks for checking @innir , I haven't yet had time to build 3.0.5.

@jbreiden
Copy link
Contributor

jbreiden commented Feb 20, 2017

Upstream was hoping 3.0.5 would be application binary interface (ABI) compatible with 3.0.4. Am I reading the crashlog correctly that the unhappy symbol is RecogAllWordsPassN? Because I don't think that has changed. Is there a preferred ABI compatibility checking tool? I ask because the other approach is fixing compatibility and a 3.0.5.1 that does not bump soname.

@egorpugin
Copy link
Contributor

Hi,

I've heard about this ABI tracker. Maybe it's possible to add tesseract (& leptonica) there somehow.
https://lvc.github.io/abi-compliance-checker/
https://abi-laboratory.pro/tracker/
https://abi-laboratory.pro/tracker/timeline/qt/ (qt example)

@innir
Copy link
Contributor

innir commented Feb 20, 2017

@jbreiden I think it's in Recognize() or one of the functions Recognize() calls. Hard to tell as the next two lines in the crash log are empty :(
Anyway, if symbols are removed, the SONAME has to be bumped, if symbols are only added that's not necessary - AFAIR. Those ABI trackers are good, but I'm not sure if they catch things like this: https://github.com/tesseract-ocr/tesseract/pull/259/files

@manisandro
Copy link

abipkgdiff output (comparing 3.04.01 and 3.05.00):

================ changes of libtesseract.so.3.0.4===============c++filt 
Functions changes summary: 0 Removed, 0 Changed, 0 Added function
Variables changes summary: 0 Removed, 0 Changed, 0 Added variable
Function symbols changes summary: 11 Removed, 41 Added function symbols not referenced by debug info
Variable symbols changes summary: 0 Removed, 3 Added variable symbols not referenced by debug info

11 Removed function symbols not referenced by debug info:

    WriteParamDesc(_IO_FILE*, unsigned short, PARAM_DESC*)
    read_list(char const*)
    GENERIC_2D_ARRAY<tesseract::TrainingSampleSet::FontClassInfo>::Resize(int, int, tesseract::TrainingSampleSet::FontClassInfo const&)
    WERD_RES::FakeWordFromRatings()
    tesseract::DocumentCache::LoadDocuments(GenericVector<STRING> const&, char const*, bool (*)(STRING const&, GenericVector<char>*))
    tesseract::DocumentCache::GetPageBySerial(int)
    tesseract::DawgPositionVector::~DawgPositionVector(), aliases tesseract::DawgPositionVector::~DawgPositionVector()
    tesseract::DawgPositionVector::~DawgPositionVector()
    tesseract::Dict::Load(tesseract::DawgCache*)
    tesseract::Dict::ProcessPatternEdges(tesseract::Dawg const*, tesseract::DawgPosition const&, int, bool, tesseract::DawgPositionVector*, PermuterType*) const
    tesseract::ImageData::PreScale(int, Pix**, int*, int*, GenericVector<TBOX>*) const

41 Added function symbols not referenced by debug info:

    TessBaseAPIDetectOrientationScript
    WriteParamDesc(_IO_FILE*, unsigned short, PARAM_DESC const*)
    GenericVector<tesseract::DawgPosition>::clear()
    STRING::SkipDeSerialize(bool, tesseract::TFile*)
    WERD_RES::FakeWordFromRatings(PermuterType)
    tesseract::TessBaseAPI::GetTSVText(int)
    tesseract::TessBaseAPI::GetHOCRText(ETEXT_DESC*, int)
    tesseract::TessBaseAPI::AnalyseLayout()
    tesseract::TessBaseAPI::DetectOrientationScript(int*, float*, char const**, float*)
    tesseract::ColPartition::SortByBBox(void const*, void const*)
    tesseract::DocumentData::SetDocument(char const*, char const*, long long, bool (*)(STRING const&, GenericVector<char>*))
    tesseract::DocumentData::IsPageAvailable(int, tesseract::ImageData**)
    tesseract::DocumentData::LoadPageInBackground(int)
    tesseract::DocumentData::UnCache()
    tesseract::DocumentCache::TotalPages()
    tesseract::DocumentCache::LoadDocuments(GenericVector<STRING> const&, char const*, tesseract::CachingStrategy, bool (*)(STRING const&, GenericVector<char>*))
    tesseract::DocumentCache::GetPageRoundRobin(int)
    tesseract::DocumentCache::GetPageSequential(int)
    tesseract::DocumentCache::CountNeighbourDocs(int, int)
    tesseract::ParamsVectors::~ParamsVectors()
    tesseract::ParamsVectors::~ParamsVectors(), aliases tesseract::ParamsVectors::~ParamsVectors()
    tesseract::TessTsvRenderer::AddImageHandler(tesseract::TessBaseAPI*)
    tesseract::TessTsvRenderer::EndDocumentHandler()
    tesseract::TessTsvRenderer::BeginDocumentHandler()
    tesseract::TessTsvRenderer::TessTsvRenderer(char const*), aliases tesseract::TessTsvRenderer::TessTsvRenderer(char const*)
    tesseract::TessTsvRenderer::TessTsvRenderer(char const*, bool)
    tesseract::TessTsvRenderer::TessTsvRenderer(char const*)
    tesseract::TessTsvRenderer::TessTsvRenderer(char const*, bool), aliases tesseract::TessTsvRenderer::TessTsvRenderer(char const*, bool)
    tesseract::TessTsvRenderer::~TessTsvRenderer()
    tesseract::TessTsvRenderer::~TessTsvRenderer()
    tesseract::TessTsvRenderer::~TessTsvRenderer(), aliases tesseract::TessTsvRenderer::~TessTsvRenderer()
    tesseract::ReCachePagesFunc(void*)
    tesseract::Dict::FinishLoad()
    tesseract::Dict::SetupForLoad(tesseract::DawgCache*)
    tesseract::Dict::Load(char const*, STRING const&)
    tesseract::DawgCache::~DawgCache(), aliases tesseract::DawgCache::~DawgCache()
    tesseract::DawgCache::~DawgCache()
    tesseract::ImageData::SkipDeSerialize(bool, tesseract::TFile*)
    tesseract::Dict::ProcessPatternEdges(tesseract::Dawg const*, tesseract::DawgPosition const&, int, bool, tesseract::DawgArgs*, PermuterType*) const
    tesseract::Dict::IsSpaceDelimitedLang() const
    tesseract::ImageData::PreScale(int, int, float*, int*, int*, GenericVector<TBOX>*) const

3 Added variable symbols not referenced by debug info:

    typeinfo for tesseract::TessTsvRenderer
    typeinfo name for tesseract::TessTsvRenderer
    vtable for tesseract::TessTsvRenderer

@zdenop
Copy link
Contributor

zdenop commented Mar 24, 2017

Is there anything we can do (in tesseract project) to fix this issue?

@lvc
Copy link

lvc commented Mar 28, 2017

I've heard about this ABI tracker. Maybe it's possible to add tesseract (& leptonica) there somehow.

Done:

https://abi-laboratory.pro/tracker/timeline/tesseract/
https://abi-laboratory.pro/tracker/timeline/leptonica/

tesseract-2

About the tracker: https://abi-laboratory.pro/index.php?view=abi-tracker

@amitdo
Copy link
Collaborator Author

amitdo commented May 7, 2017

@amitdo amitdo closed this as completed Sep 11, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants