-
Notifications
You must be signed in to change notification settings - Fork 0
/
BlogCodeAnalyzerOld.html
442 lines (429 loc) · 23.4 KB
/
BlogCodeAnalyzerOld.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
<!DOCTYPE html>
<html>
<head>
<!--
- BlogCodeAnalyzer.htm - Code Artist Thoughts
- ver 1.0 - 02 September 2016
- Jim Fawcett, Syracuse University
-->
<meta http-equiv="content-type" content="text/html;charset=UTF-8" />
<meta name="description" content="Software Engineering course notes. Code Samples. Software Links" />
<meta name="keywords" content="Lecture, Notes, Code, Syracuse,University" />
<meta name="Author" content="Jim Fawcett" />
<meta name="Author" content="James Fawcett" />
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
<title>Blog.CodeAnalyzer</title>
<script src="js/ScriptsUtilities.js"></script>
<script src="js/ScriptsTemplate.js"></script>
<script src="js/ScriptsKeyboard.js"></script>
<script src="js/ScriptsMenu.js"></script>
<script src="js/ScriptsSizerComp.js"></script>
<link rel="stylesheet" href="css/StylesSizerComp.css" />
<link rel="stylesheet" href="css/StylesTemplate.css" />
<link rel="stylesheet" href="css/StylesDefault.css" />
<link rel="stylesheet" href="css/StylesMenu.css" />
<link rel="stylesheet" href="css/StylesBrownTheme.css" />
<style>
.itemHeading {
margin-top: 10px;
margin-bottom: 10px;
font-size: 1.1em;
font-weight: 500;
}
</style>
<style>
#github header {
margin-left: 0px;
margin-right: 0px;
}
#github #pagetitle {
background-color: #e3dac9;
color: #800020;
border: 1px solid maroon;
}
#github #title {
background-color: #e3dac9;
color: #3f000f;
}
</style>
</head>
<body id="github" onload="initializeMenu()">
<navKeys-Container>
<nav-Key id="sKey" onclick="toggleSwipeEvents()">S</nav-Key>
<nav-Key id="rKey" onclick="location.reload()">R</nav-Key>
<nav-Key id="tKey" onclick="scrollPageTop()">T</nav-Key>
<nav-Key id="bKey" onclick="scrollPageBottom()">B</nav-Key>
<nav-Key id="hKey" onclick="helpWin()">H</nav-Key>
<nav-Key id="pKey" onclick="loadPrev()">P</nav-Key>
<nav-Key id="nKey" onclick="loadNext()">N</nav-Key>
</navKeys-Container>
<nav>
<div id="navbar"></div>
</nav>
<a id="Next" href="BlogMTree.html">N</a>
<a id="Prev" href="BlogParser.html">P</a>
<header>
<hgroup id="pagetitle">
<h1 id="title">Code Artistry - Code Analyzer</h1>
</hgroup>
</header>
<!-- page content -->
<h3>Initial Thoughts:</h3>
<t-b>
Code analysis is an interesting topic that takes us into many of the areas covered by our course sequence, especially
<a href="CSE681.htm">CSE681 - Software Modeling and Analysis</a>, <a href="CSE687.htm">CSE687 - Object Oriented Design</a>
and <a href="CSE776.htm">CSE776 - Design Patterns</a>. Before reading this blog you may wish to review the
<a href="BlogParser.htm">Code Parsing Blog</a>, as parsing is the most important and complex activity for the Analyzer.
I'll assume that you are familiar with that material and not repeat those discussions here.
</t-b>
<t-b>
The <strong>Mission</strong> of the analyzer is to compute size and complexity metrics for all the source code files
residing in the directory tree rooted at a specified path. It can also show you the Abstract Syntax Tree (AST) it built
to hold the analysis results, and show the files analyzed with their line counts. Another goal of this design is
to provide a structure that can easily be extended to other kinds of analysis, e.g., finding dependencies between all files
in the working set, or identifying some of the flaws in a design.
</t-b>
<t-b>
The design and implementation come in two parts. The first part is a console application written in C++, using some
of the features of the latest C++11 standard. It takes all of its inputs from its command line and writes its output
to the console and also, if that option is chosen, to a log file deposited at the root of the analysis path. This
first part implements the entire analysis functionality of the Analyzer.
</t-b>
<t-b>
The second part is a Graphical User Interface, written in the managed .Net language C++/CLI, and uses the
Windows Presentation Foundation (WPF) framework<sup>1</sup>. The purpose of the GUI is to support easily browsing to the
root directory for analysis and to collect user settings. The path and settings it builds into a command line and then
starts the Analyzer console application process with that.
</t-b>
<t-b>
The implementation was divided into these two parts because browsing for a starting path is much easier with a GUI using
an OpenFileDialog<sup>2</sup> than from the command line with a lot of CDing to navigate. However, a console is a better place to
send a lot of information which needs space and flexibility to convey a lot of resulting information.
</t-b>
<t-b>
In this discussion we look at the package structure of the analyzer, its activities, and the output it generates. At the
end we will draw some conclusions about what makes this design interesting and areas for improvement.
</t-b>
<ol>
<li>
<h3 style="margin-top:15px;">Packages:</h3>
<div style="width:calc(100vw - 9rem)"><div id="fig1"></div></div>
The Code Analyzer package structure is shown in the figure at the right. The packages focus on three areas:
<ul>
<li>
<div class="itemHeading">Building a List of Source Code Files:</div>
<div class="itemContent">
The Window package is used to identify the root directory for the analysis and to pick file patterns for the language
we want to analyze, e.g., *.h, *.cpp, or *.cs.
<br /><br />
FileMgr package constructs a list of fully qualified file names that match specifications provided by the window
package. It uses FileSystem package to query the file system to find child directories and files<sup>3</sup>
using operating system APIs (we have implementations for both Windows and Linux). It does its work by
recursively calling its search function with the root and then discovered child directories.
</div>
</li>
<li>
<div class="itemHeading">Parsing each Source Code File:</div>
<div class="itemContent">
The Parser package is responsible for analyzing a source file based on rules defined in the ActionsAndRules package.
Parser collects its input from the scanner packages Tokenizer and SemiExp. That results in a list of tokens that has
just enough information to make a decision about grammatical constructs without having to save left over information for the
next round. What that last sentence means is that the token sequence has just enough tokens to search for matches in the rule
collection.
<br /><br />
When a rule matches it invokes its actions. It is the actions that build the Abstract Syntax Tree (AST), using help from the
Scope Stack. That process continues until all of the files have been analyzed.
<br /><br />
The AbstrSynTree and ScopeStack packages provide all the mechanics for building the AST. The AST serves as a container for all
information that results from parsing all the selected files.
</div>
</li>
<li>
<div class="itemHeading">Displaying Results:</div>
<div class="itemContent">
Each of the displays is build by the Display package from information contained in the AST. This package provides functionality
to build:
<ul>
<li>
Metrics Display which shows, for each function, size in line count and complexity as measured by the number of descendent scopes
in the function.
</li>
<li>Abstract Syntax Tree visualization, an indented list of all the nodes in the AST.</li>
<li>SLOC list, showing each file and its source line count.</li>
</ul>
The logger package is used by display to write to the console and to a log file. It has three different kinds of output,
using three different specialized loggers:
<ul>
<li>Results mode - used to show normal output</li>
<li>
Demonstration mode - used to help users understand how the program works, mostly by looking at rule firings and
the resulting actions
</li>
<li>
Debug mode - used to look at parsing details to help make the parsing detections and actions correct.
</li>
</ul>
Analyzser code has statements for each of these three specialized loggers. The user just turns them on and off to
see the various types of output. That is done with checkboxes in the GUI Display Mode tab.
</div>
</li>
</ul>
<!--<div style="clear:both"></div>-->
</li>
<li>
<h3 style="margin-top:0px;">Activities:</h3>
<div style="width:calc(100vw - 9rem)"><div id="fig2"></div></div>
Activities are separated, in the activities diagram at the right, into two rows. The top row shows GUI activities and the bottom
shows the Analyzer activities.
<ul>
<li>
<div class="itemHeading">GUI Window Activities:</div>
Start sequentially, but may enter a loop where the user makes settings, runs the anlyzer and looks at the
analysis output. Then the user may select new settings, a new path for example, and runs the analyzer again.
<ul>
<li>
<div class="itemHeading">Create Views:</div>
<div class="itemContent">
This first activity builds the views programmatically<sup>4</sup>. There are three: Execution, Setup, and Display Mode views.
Each view provides the contents for one of the GUI window tabs.
</div>
</li>
<li>
<div class="itemHeading">Retrieve User Settings:</div>
<div class="itemContent">
This activity reads a text file with the last set of user settings and populates controls on the window accordingly,
e.g., checking checkboxes and writing paths into textboxes.
</div>
</li>
<li>
<div class="itemHeading">Respond to User Actions:</div>
<div class="itemContent">
Here, the GUI responds, via its event handler functions, to button clicks and changes in checkboxes and text in textboxes.
For example, clicking on checkboxes or changing text in a textbox changes the member data of the Window class.
</div>
</li>
<li>
<div class="itemHeading">Create CodeAnalyzer Command Line:</div>
<div class="itemContent">
A string that represents the console application's command line is built using Window member data that was
determined by extracting information from check boxes and textboxes in the previous activity.
</div>
</li>
<li>
<div class="itemHeading">Start CodeAnalyzer Process:</div>
<div class="itemContent">
In response to a "Start Analysis" button click the button's event handler prepares information to start the
console application analyzer, passing it the command line prepared by the previous activity.
</div>
</li>
<li>
<div class="itemHeading">Save User Settings:</div>
<div class="itemContent">
At the end of an analysis the user settings are saved to a text file located in the same directory as the GUI and Analyzer
execution images. At this point the GUI is idle until the user either clicks the kill button to terminate or makes
changes to the settings to run another analysis.
</div>
</li>
</ul>
<br />
</li>
<li>
<strong>Code Analyzer activities</strong> are all sequential in the large, although there are many internal loops, not shown here,
that run during file analysis and parsing.
<ul>
<li>
<div class="itemHeading">Process Command Line:</div>
<div class="itemContent">
The CodeAnalyzer starts by processing its command line. That may result in immediate termination if needed settings
have been omitted. That won't happen when using the GUI, but might when users run the CodeAnalyzer from a command line.
That hasn't been shown in the diagram because we show the GUI driven activities.
If the command line has all the needed information, processing moves to the next activity.
</div>
</li>
<li>
<div class="itemHeading">Find Source Code Files:</div>
<div class="itemContent">
The CodeAnalyzer executive passes the path and file patterns to the FileMgr package for this activity. That results
in a recursive descent through the directory tree rooted at the specfied path, looking for all the files that match
the given patterns. That information flows into the next activity.
</div>
</li>
<li>
<div class="itemHeading">Parse Source Code:</div>
<div class="itemContent">
For each file in the file list the parser collects SemiExps and passes them to each of its rules until a match
occurs. The match results in calls into the AbstrSynTree package to add another node to the AST or add information
to an existing node on the top of the Scope Stack.
</div>
</li>
<li>
<div class="itemHeading">Build AST & Record Lines:</div>
<div class="itemContent">
The loop between "Parse Source Code" and "Build AST & Record Lines" is traversed for each
rule detection. This looping continues from the start of the first file analysis to the end of the last.
If the rule is for the beginning of a scope, a new node is pushed onto the Scope Stack. If the rule matches
an end of scope event then the node is popped off the stack. For all other rules information in the node at the
top of the stack is modified.
</div>
</li>
<li>
<div class="itemHeading">Complexity Analysis:</div>
<div class="itemContent">
When we get to this activity all the AST nodes have been added to the tree. Complexity Analysis just walks the AST and for each
namespace, class, struct, or function node, records the number of descendent scope nodes. That, of course,
has to happen just before we pop back to a parent node during the tree walk, at which time we record the
complexity information in the current node. When we leave this activity all of the analysis information has been
stored in the tree.
</div>
</li>
<li>
<div class="itemHeading">Display Metrics:</div>
<div class="itemContent">
The Display package gathers Metric information by walking the AST, collecting node name, type, starting line, line count, and complexity.
That is then formatted into a table and written out using the logger.
<br /><br />
Actions store data declarations encountered in some scope in the AST node for that scope. They keep track of whether the data
has public, protected, or private access specified. That information is stored along with the data declaration in the AST node
for that scope. This information is shown in the metrics display and also used in the last activity of the CodeAnalyzer
before termination.
</div>
</li>
<li>
<div class="itemHeading">Display SLOCs:</div>
<div class="itemContent">
While building the AST actions store size information in a table with filename keys. That is enumerated by the Display package
to provide this display.
</div>
</li>
<li>
<div class="itemHeading">Display AST:</div>
<div class="itemContent">
This display is generated by the Display package by walking the AST tree, extracting information, and displaying with
an indentation that is proportional to the depth of the current node in the tree.
</div>
</li>
<li>
<div class="itemHeading">Display Metrics Summary:</div>
<div class="itemContent">
This activity is very similar to the Metrics display processing except that the only information displayed is that which
exceeds limits for function size and complexity. Also shown are all of the public data defined anywhere in the AST.
</div>
</li>
</ul>
</li>
</ul>
<!--<div style="clear:both"></div>-->
</li>
<li>
<h3 style="margin-top:0px;">Output:</h3>
<div style="width:calc(100vw - 9rem)"><div id="fig3"></div></div>
You see, in the figure at the right, a typical output for VisualCodeAnalyzer execution. The user has selected both C++ and C# file analysis,
browsed to the root folder of the CodeAnalyser Solution, selected metric display, and started the CodeAnalyzer. The command line paramters
for the console application are shown at the top of the display, along with the date and time of execution.
<br /><br />
It is convenient to have the controls and output in separate windows. We can look at the console output, decide to change some execution
parameters, do that in the GUI window while still observing the output, and run again. One other point - one way communication from the GUI
application to the console is a lot eaiser to set up and manage than two way communcation between the executing code and GUI displays.
<br /><br />
Notice that public data is shown in the console window just below the class or struct that owns that data, and the display also localizes
it to a particular package. In this application the public data are members of a couple of structs that are strictly private
to the implementation. They are never returned from functions that another code author might have to deal with. I treat public data
from classes, and any construct that other code has to use, as an error of design. I do not do that for data held in private structs.
<div style="clear:both"></div>
</li>
</ol>
<p></p>
<h3>Summary - The Good, The Bad, and the Ugly<sup>5</sup>:</h3>
<div class="indent">
<h4>Here are things I like about this design:</h4>
<ol>
<li>
<strong>The combination of GUI and Console Application</strong>. GUI for browsing and setting parameters combined with a
console analysis application for execution of the analysis is simple, works well, and is very usable.
</li>
<li>
<strong>The structure is fairly simple</strong> for processing as complex as code analysis. We've built something close to a compiler front end.
It is of course simpler, because we need to recognize only a small part of the C++ and C# languages. It's also surprising how
much code is common for the analysis of both C++ and C# code.
</li>
<li>
<strong>The individual parts are all recognizable by name</strong> and function and they <strongs>distribute the program's complexity</strongs>
fairly uniformily among themselves.
</li>
<li>
<strong>Not much Need to Change:</strong><br />
The structure is such that other applications like dependency analysis will keep almost all the packages intact, only modifying
a few, like Window, ActionsAndRules, and Display, and those modifications will be small. Almost all the other of the fifteen
packages will not need to change.
</li>
</ol>
<h4>Here are things I don't like about the design:</h4>
<ol>
<li>
<strong>Processing is not concurrent.</strong><br />
Analysis of each file is independent of that for every other file. The only thing that is shared is use of the Astract Syntax tree
and scope stack.
<ul>
<li>
That means that we could make the expensive parsing part concurrent, provided that we let each analyzer thread have its own Abstract
Syntax Tree and Scope Stack. We just run each file's analysis on a thread pool thread.
</li>
<li>
That means that we have to build mechanics to merge the ASTs for each file, but that is close to trivial to accomplish.
</li>
<li>
We would also have to construct a parser for each file because the parser is welded to its scanner data source and that
could work only on a single file at a time. It turns out that is also easy to do. In fact some of my parser demos do just that.
</li>
</ul>
So making this application concurrent should be relatively easy to do. I just haven't done that yet.
</li>
<li>
<strong>It's incomplete.</strong><br />
I haven't gotten around to trying on Java code. I expect that since it works on C# it's very likely to need next to no changes
for Java.
</li>
</ol>
<h4>Here are the ugly parts:</h4>
<ol>
<li>
Well, I really can't think of anything I think is ugly about this design!
</li>
</ol>
</div>
<div style="height:30px;"></div>
<hr />
<ol>
<li>
The C++/CLI compiler tool chain does not have a Xaml processor so all of the WPF functionality needs to be implemented
with code - no declarative implementation.
</li>
<li>
You might thing we would use a FolderBrowserDialog for selecting directories. I did not for two reasons.
The first is that the FolderBrowserDialog control doesn't work very well. It does not scroll down to the
selected path when you open it. You have to manually scroll and that gets to be a pain.
The second reason is that sometimes we want to select specific files to process, not all those matching a
pattern. For that you need the OpenFileDialog.
</li>
<li>
C++ surprisingly does not have a directory manipulation library, so I wrote one a couple of years ago
and use that here.
</li>
<li>
We build views programmatically because we can't do that declaratively with C++/CLI. See footnote 1.
</li>
<li>
Thanks Clint! Great movie.
</li>
</ol>
<p>
<img class="photo" src="Pictures/campusStrip.jpg" alt="Newhouse" style="width:100%;" />
</p>
<info-bar></info-bar>
<script>
createSizer("Pictures/CodeAnalyzerPackages.JPG", "Fig 1. Code Analyzer Packages", 400, "fig1");
createSizer("Pictures/CodeAnalyzerActivities.JPG", "Fig 2. Code Analyzer Activities", 400, "fig2");
createSizer("Pictures/CodeAnalyzerOutput.JPG", "Fig 3. Code Analyzer Output", 600, "fig3");
</script>
</body>
</html>