Skip to content

feat: add csv_parser to parse legacy OTA image's metadata CSV files into sqlite3 db #466

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Dec 26, 2024

Conversation

Bodong-Yang
Copy link
Member

@Bodong-Yang Bodong-Yang commented Dec 20, 2024

Introduction

Note

This PR is split from #453 . The PR only introduces the implementation of csv_parser, but not yet integrate it into otaclient code, also the previous legacy OTA image parser implementation is not changed in this PR.

This PR introduces the implementation of parsing legacy OTA image metadata CSV files into file_table entries, and importing the parsed results as sqlite3 tables.
NOTE that the introduced implementation is not yet integrated into otaclient code, will be integrated in other future PR.

@Bodong-Yang Bodong-Yang added the feature New feature is introduced label Dec 20, 2024
Copy link
Contributor

github-actions bot commented Dec 20, 2024

Coverage

Coverage Report
FileStmtsMissCoverMissing
src/ota_metadata/file_table
   __init__.py40100% 
   _orm.py160100% 
   _table.py911484%139–140, 178, 197–202, 204–206, 218–219
   _types.py31487%47, 54–56
src/ota_metadata/legacy
   __init__.py10100% 
   csv_parser.py99693%100, 153–155, 160, 272
   parser.py3354885%106, 170, 175, 211–212, 222–223, 226, 238, 289–291, 295–298, 324–327, 396, 399, 407–409, 422, 431–432, 435–436, 601–603, 653–654, 657, 685–686, 689–690, 692, 696, 698–699, 753, 756–758
   rs_table.py351071%75, 87–95
   types.py841384%37, 40–42, 112–116, 122–125
src/ota_metadata/utils
   cert_store.py86890%58–59, 73, 87, 91, 102, 123, 127
src/ota_proxy
   __init__.py16756%48–49, 51, 53, 62, 72–73
   __main__.py660%16, 18–19, 21–22, 24
   _consts.py170100% 
   cache_control_header.py68494%71, 91, 113, 121
   cache_streaming.py1431390%209, 223, 227–228, 263–264, 266, 278, 347, 365–368
   config.py200100% 
   db.py801877%103, 109, 167, 173–174, 177, 183, 185, 209–216, 218–219
   errors.py50100% 
   external_cache.py282028%31, 35, 40–42, 44–45, 48–49, 51–53, 60, 63–65, 69–72
   lru_cache_helper.py46295%85–86
   ota_cache.py2346472%72–73, 140, 143–144, 156–157, 189, 192, 214, 234, 253–257, 261–263, 265, 267–274, 276–278, 281–282, 286–287, 291, 338, 346–348, 421, 448, 451–452, 474–476, 480–482, 488, 490–492, 497, 523–525, 559–561, 588, 594, 609
   server_app.py1403972%71, 74, 80, 99, 103, 162, 171, 213–214, 216–218, 221, 226–227, 230, 233–234, 237, 240, 243, 246, 259–260, 263–264, 266, 269, 295–298, 301, 315–317, 323–325
   utils.py130100% 
src/otaclient
   __init__.py5260%17, 19
   __main__.py110%16
   _logging.py513335%43–44, 46–47, 49–54, 56–57, 59–60, 62–65, 67, 77, 80–82, 84–86, 89–90, 92–96
   _otaproxy_ctx.py42420%20, 22–29, 31–36, 38, 40–41, 44, 46–50, 53–56, 59–60, 62–63, 65–67, 69, 74–78, 80
   _status_monitor.py1841790%48–49, 136–138, 161, 164, 184, 187, 203–204, 212, 215, 278, 300, 325–326
   _types.py960100% 
   _utils.py30293%80–81
   errors.py120199%97
   main.py25250%17, 19–29, 31–33, 35, 37, 41–42, 44–46, 48–50
   ota_core.py35514958%127, 129–130, 134–135, 137–139, 143–144, 149–150, 156, 158, 217–220, 343, 375–376, 378, 387, 390, 395–396, 399, 405, 407–411, 418, 424, 459–462, 465–476, 479–482, 523–526, 542–543, 547–548, 614–621, 626, 629–636, 661–662, 668, 672–673, 679, 696–698, 700, 721–722, 730–731, 768, 790, 817–819, 828–834, 848–854, 856–857, 862–863, 871–874, 876–877, 885, 887, 893, 895, 901, 903, 907, 913, 915, 921, 924–926, 936–937, 948–950, 952–953, 955, 957–958, 963, 965, 970
src/otaclient/boot_control
   __init__.py40100% 
   _firmware_package.py932276%82, 86, 136, 180, 186, 209–210, 213–218, 220–221, 224–229, 231
   _grub.py41812769%214, 262–265, 271–275, 312–313, 320–325, 328–334, 337, 340–341, 346, 348–350, 359–365, 367–368, 370–372, 381–383, 385–387, 466–467, 471–472, 524, 530, 556, 578, 582–583, 598–600, 624–627, 639, 643–645, 647–649, 708–711, 736–739, 762–765, 777–778, 781–782, 817, 823, 843–844, 846, 868–870, 888–891, 916–919, 926–929, 934–942, 947–954
   _jetson_cboot.py2612610%20, 22–25, 27–29, 35–40, 42, 58–60, 62, 64–65, 71, 75, 134, 137, 139–140, 143, 150–151, 159–160, 163, 169–170, 178, 187–191, 193, 199, 202–203, 209, 212–213, 218–219, 221, 227–228, 231–232, 235–237, 239, 245, 250–252, 254–256, 261, 263–266, 268–269, 278–279, 282–283, 288–289, 292–296, 299–300, 305–306, 309, 312–316, 321–324, 327, 330–331, 334, 337–338, 341, 345–350, 354–355, 359, 362–363, 366, 369–372, 374, 377–378, 382, 385, 388–391, 393, 400, 404–405, 408–409, 415–416, 422, 424–425, 429, 431, 433–435, 438, 442, 445, 448–449, 451, 454, 462–463, 470, 480, 483, 491–492, 497–500, 502, 509, 511–513, 519–520, 524–525, 528, 532, 535, 537, 544–548, 550, 562–565, 568, 571, 573, 580, 587–589, 591, 593, 596, 599, 602, 604–605, 608–612, 616–618, 620, 628–632, 634, 637, 641, 644, 655–656, 661, 671, 674–680, 684–690, 694–703, 707–715, 719, 721, 723–725
   _jetson_common.py1724573%132, 140, 288–291, 294, 311, 319, 354, 359–364, 382, 408–409, 411–413, 417–420, 422–423, 425–429, 431, 438–439, 442–443, 453, 456–457, 460, 462, 506–507
   _jetson_uefi.py40427432%124–126, 131–132, 151–153, 158–161, 328, 446, 448–451, 455, 459–460, 462–470, 472, 484–485, 488–489, 492–493, 496–498, 502–503, 508–510, 514, 518–519, 522–523, 526–527, 531, 534–535, 537, 542–543, 547, 550–551, 556, 560–561, 564, 568–570, 572, 576–579, 581–582, 604–605, 609–610, 612, 616, 620–621, 624–625, 632, 635–637, 640, 642–643, 648–649, 652–655, 657–658, 663, 665–666, 674, 677–680, 682–683, 685, 689–690, 694, 702–706, 709–710, 712, 715–719, 722, 725–729, 733–734, 737–742, 745–746, 749–752, 754–755, 762–763, 773–776, 779, 782–785, 788–792, 795–796, 799, 802–805, 808, 810, 815–816, 819, 822–825, 827, 833, 838–839, 858–859, 862, 870–871, 878, 888, 891, 898–899, 904–907, 915–918, 926–927, 939–942, 944, 947, 950, 958, 966–968, 970–972, 974–978, 983–984, 986, 999, 1003, 1006, 1016, 1021, 1029–1030, 1033, 1037, 1039–1041, 1047–1048, 1053, 1061–1066, 1071–1076, 1081–1089, 1094–1101, 1109–1111
   _ota_status_control.py1021189%117, 122, 127, 240, 244–245, 248, 255, 257–258, 273
   _rpi_boot.py28713353%53, 56, 120–121, 125, 133–136, 150–153, 158–159, 161–162, 167–168, 171–172, 181–182, 222, 228–232, 235, 253–255, 259–261, 266–268, 272–274, 284–285, 288, 291, 293–294, 296–297, 299–301, 307, 310–311, 321–324, 332–336, 338, 340–341, 346–347, 354, 357–362, 393, 395–398, 408–411, 415–416, 418–422, 450–453, 472–475, 498–501, 506–514, 519–526, 541–544, 551–554, 562–564
   _slot_mnt_helper.py100100% 
   configs.py510100% 
   protocol.py50100% 
   selecter.py412929%44–46, 49–50, 54–55, 58–60, 63, 65, 69, 77–79, 81–82, 84–85, 89, 91, 93–94, 96, 98–99, 101, 103
src/otaclient/configs
   __init__.py170100% 
   _cfg_configurable.py470100% 
   _cfg_consts.py47197%97
   _common.py80100% 
   _ecu_info.py56492%59, 64–65, 112
   _proxy_info.py50884%84, 86–87, 89, 100, 113–115
   cfg.py230100% 
src/otaclient/create_standby
   __init__.py13192%36
   common.py2264480%59, 62–63, 67–69, 71, 75–76, 78, 126, 174–176, 178–180, 182, 185–188, 192, 203, 279–280, 282–287, 299, 339, 367, 370–372, 388–389, 403, 407, 429–430
   interface.py70100% 
   rebuild_mode.py1151091%98–100, 119, 150–155
src/otaclient/grpc/api_v2
   ecu_status.py145795%117, 142, 144, 275, 347–348, 384
   ecu_tracker.py53530%17, 19–22, 24–30, 32, 34, 38–39, 42, 44, 50–53, 55, 57, 59–62, 69, 73–76, 80–81, 83, 85, 87–95, 99–100, 102, 104–107
   main.py41410%17, 19–24, 26–27, 29, 32, 39, 41–42, 44–45, 47–48, 50–55, 57–59, 61, 64, 70, 72–73, 76–77, 79–82, 84–85, 87
   servicer.py12710517%58–62, 64–65, 67–68, 74–78, 82–83, 88, 91, 95–97, 101–103, 111–113, 118–119, 122–123, 129, 132–136, 145–155, 162, 168, 171–173, 184–186, 189–191, 196, 203–206, 209, 213–214, 219, 222, 226–228, 232–234, 242–243, 248–249, 252–253, 259, 262–266, 275–284, 291, 297, 300–302, 307–308, 311
   types.py44295%78–79
src/otaclient_api/v2
   api_caller.py39684%45–47, 83–85
   types.py2563287%61, 64, 67–70, 86, 89–92, 131, 209–210, 212, 259, 262–263, 506–508, 512–513, 515, 518–519, 522–523, 578, 585–586, 588
src/otaclient_common
   __init__.py341555%42–44, 61, 63, 68–77
   _io.py64198%41
   cmdhelper.py130100% 
   common.py1061090%148, 151–153, 168, 175–177, 271, 275
   downloader.py1991094%107–108, 126, 153, 369, 424, 428, 516–517, 526
   linux.py611575%51–53, 59, 69, 74, 76, 108–109, 133–134, 190, 195–196, 198
   logging.py42490%56, 87–88, 95
   persist_file_handling.py1181884%113, 118, 150–152, 163, 192–193, 228–232, 242–244, 246–247
   proto_streamer.py42880%33, 48, 66–67, 72, 81–82, 100
   proto_wrapper.py3985785%87, 134–141, 165, 172, 184–186, 189–190, 205, 210, 221, 257, 263, 268, 299, 303, 307, 402, 462, 469, 472, 492, 499, 501, 526, 532, 535, 537, 562, 568, 571, 573, 605, 609, 611, 625, 642, 669, 672, 676, 692, 707, 713, 762–763, 765, 803–805
   retry_task_map.py129993%134–135, 153–154, 207–208, 210, 230–231
   shm_status.py952177%79–80, 83–84, 105, 120–122, 134, 139, 156–160, 169–170, 172, 179, 192, 204
   typing.py31487%48, 97–98, 100
TOTAL7001193672% 

Tests Skipped Failures Errors Time
251 0 💤 0 ❌ 0 🔥 13m 7s ⏱️

@Bodong-Yang Bodong-Yang marked this pull request as ready for review December 25, 2024 08:34
Copy link
Collaborator

@airkei airkei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, left one comments. Feel better to use csv library.

Comment on lines +89 to +90
for _idx, line in enumerate(f, start=1):
_new = parse_dirs_csv_line(line)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this parser is only for csv, how about to use the default CSV library instead of regex? I imagine the reason to use regex is to re-use the existing parser for text, but feel that csv library for csv parser is more safe.
Another idea is to aggregate regext module as helper function between text and csv. It might be preference, but I don't mind to create/use such utility functions between Classes.
https://docs.python.org/3/library/csv.html

@Bodong-Yang
Copy link
Member Author

Feel better to use csv library.

@airkei 😢 I do like to, but again, the CSV format of the legacy OTA image is kind of nonstandard, long-long time ago I did want to use csv lib, but then I gave up.

@Bodong-Yang
Copy link
Member Author

( Since ancient time of this repository, the regex was in used to parse the CSV.

Copy link
Collaborator

@airkei airkei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, in that case, I agree with proceeding this approach.
I also feel it might be much more maintainable to isolate legacy modules completely rather than creating utility function...

@Bodong-Yang
Copy link
Member Author

I also feel it might be much more maintainable to isolate legacy modules completely rather than creating utility function...

Could you elaborate your idea with more details?

@airkei
Copy link
Collaborator

airkei commented Dec 26, 2024

Could you elaborate your idea with more details?

Sure! currently there is code clone between parser and csv_parser.

    res.mode = int(_ma.group("mode"), 8)
    res.uid = int(_ma.group("uid"))
    res.gid = int(_ma.group("gid"))
    res.slink = de_escape(_ma.group("link"))
    res.srcpath = de_escape(_ma.group("target"))

My idea is to create a single function and aggregate these regex parts between parser and csv_parser.
But in this case, if we change the format of either csv or text, we need to change or separate the function. This is in "legacry" modules, will not update so frequently. so I think we don't have to take care much for the cloning.
This might be relate to the topic between "SOLID" and "Code Clone".

@Bodong-Yang
Copy link
Member Author

My idea is to create a single function and aggregate these regex parts between parser and csv_parser.
But in this case, if we change the format of either csv or text, we need to change or separate the function. This is in "legacry" modules, will not update so frequently. so I think we don't have to take care much for the cloning.
This might be relate to the topic between "SOLID" and "Code Clone".

No worry, this PR is part of my effort to rewrite the ota_metadata.legacy package, as we are migrating to use sqlite3 to operate the file_table.
So later I will remove the old code when the new implementation is ready. The old parser module along with other old implementation will be removed.

@Bodong-Yang Bodong-Yang merged commit bfe9b24 into main Dec 26, 2024
8 checks passed
@Bodong-Yang Bodong-Yang deleted the feat/csv_parser branch December 26, 2024 05:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature is introduced
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants