A Conversation About Cyclomatic Complexity
We’ve talked about cyclomatic complexity on the podcast a few times (116: You Have to Care and 94: Don’t Be Clever). In summary, cyclomatic complexity is a measure of how complicated your code is. Less complicated code is easier to debug, test, and often faster to run.
Code should be as complicated as it needs to be, but no more complicated.
However, I haven’t really been monitoring cyclomatic complexity as a software concept; it has not been on my personal radar. When it came up in a slack channel recently, I enjoyed learning more.
Note: names changed to protect the not-very-innocent.
BlackCat [11:13 AM]: Do any of y’inz have a recommendation for a cyclomatic complexity calculator (hopefully for eclipse)?
CouchLump [12:16 PM]: @HipCat recommended https://people.debian.org/~bame/pmccabe/download.html
BlackCat [12:23 PM]: Thank you, I’ll take a look
GoldenRetriever [1:37 PM]: I use the same one b
BlackCat [1:48 PM]: ah, for mac it is installable with brew
BlackCat [2:02 PM]: A complexity of 137 is probably “A Bad Thing®“. Right?
CouchLump [2:08 PM]: Relative complexity seems more interesting than a given number... like if you clean it up, can you make it better? Or is that 137 operations per line of code?
BlackCat [2:09 PM]: in this case, probably not, it’s one of those “now that we have all of the data, let’s crunch the numbers”
137 for 426 lines of code
I expect that my command line parser would be pretty high too.
Dread Cat Roberts (DCR) [5:22 PM]: Pmccabe and Gnu complexity give different numbers but 137 is bad in both
There are rare cases where the code might actually be OK
Like an enormous switch statement
But you could also maybe just do something more table driven there
I've never seen code >30 which I was actually completely happy with
So it's a good metric even if overplayed sometimes
Wow. Uber is shutting down in Arizona
Otter [5:53 PM]: NASA photographer's camera cooked by last week's SpaceX rocket launch
GoldenRetriever [6:23 PM]: For our safety code we limit complexity to <= 20. 20 was picked sort of arbitrarily, but the few times we’ve had code > 20 I’ve agreed “yeah, that’s overly complicated” and refactored it to be cleaner.
Otter [7:13 PM]: I’m scared to run it here
But I’m gonna
I assume you can target a module or a set of c files?
CouchLump [7:36 PM]: You know, this would be a good blog post. I had no idea that there are targets.
Otter [7:41 PM]: I nominate someone knowledgeable
HipCat [8:00 PM]: Complexity of 137 is bad! No relative here.
< 10 good
10-20 not great but you can work with it (at element we aimed for 15 as a max)
> 20 refactor or have a damn good reason (we have generated decision trees
that we were all 32 because that's the way machine learning code works)
@Otter, yes, just hand pmccabe a list of files. The output isn’t the greatest but use the values from column one.
Otter [8:04 PM]: Thanks. I’m guessing certain areas I know of are pushing 200
DCR [9:05 PM]: We had something close to 1000 in one function. I wrote it of course.
It had an apologetic comment and a command to be refactored when I left it
Of course the next 50 additions to that function did no such thing
BlackCat [9:31 PM]: I was sure that that function was going to be bad, like really bad. Not 17,000 bad. It’s a number cruncher though, implementing all of the various options of the system and all of the customer’s “oh, it has to do this too” paste-ons.
I wasn’t surprised when it was the function in the system with the maximal complexity.
The rest of the system is 20-- except for the web server. (nobody said no to feature bloat and it was all verbal)
The reason I asked about a complexity measuring app is that I was listening to IEEE Software Engineering podcast on the way to work and they started talking about unit testing.
Dude figured that, at a minimum, your unit test set should have at least one independent test per complexity index. So that function that I wrote should have at least 137 tests, one per path through the code.
I figured it was a good insufficient rule of thumb.
DCR [10:18 PM]: The old logic was that complexity of 137 means the code is not testable
I don't believe that with fuzzers
But it is still not good code
BlackCat [10:19 PM]: I’d buy that. If there is ever any question about that code, I cringe.
Now that I have a complete set of specs, I can probably begin writing the code, or release it :wink:
HipCat [7:19 AM]: Btw, the columns for pmccabe are described well in the docs:
Column 1 contains cyclomatic complexity calculated by adding 1 (for the function) to the occurences of for, if, while, switch, &&, ||, and ?. Unlike "normal" McCabe cyclomatic complexity, each case in a switch statement is not counted as additional complexity. This treatment of switch statements and complexity may be more useful than the "normal" measure for judging maintenance effort and code difficulty.
Column 2 is the cyclomatic complexity calculated in the "usual" way with regard to switch statements. Specifically it is calculated as in column 1 but counting each case rather than the switch and may be more useful thank column 1 for judging testing effort.
And I don’t remember where I got this but this is typically what I’ve seen:
In general, for method level complexity:
< 10 Easy to maintain
11-20 Harder to maintain
21+ Candidates for refactoring/redesign
Also consider that higher complexities make the code harder to unit test.
Note the “in general”. This is not a hard, fast rule. Yes, there will be times where a function is just complex. But you should strive to bring it down.
BlackCat [7:21 AM]: BTW, pmccabe -v
HipCat [7:33 AM]: This is what I ended up using to show the worse offenders:
pmccabe -v $(SRC_FILES_FOR_COMPLEXITY_REPORT) | sort -nr > complexity_report.txt
I had someone write something that took that and HTML-ized it so it easier on the eyes.
Sorry, this should all be in #software
BlackCat [7:35 AM]: I was thinking of putting, almost exactly, that in the post processing stage of my eclipse compiles so that the highest complexity routine is displayed on the bottom line.
HipCat [7:35 AM]: I grabbed that “in general” quote from here:
Software Engineering Stack Exchange
What does the 'cyclomatic complexity' of my code mean?
I am new to static analysis of code. My application has a Cyclomatic complexity of 17,754 lines of code. The application itself is only 37,672 lines of code. Is it valid to say that the complexity is
The numbers line up with what I’ve seen over the years.
BlackCat [7:37 AM]: Right, that’s where I got some round numbers too. 17K??? No, my stinker of a 137 isn’t that bad, but I still figured it was going to be bad.
HipCat [7:37 AM]: That question is ill-formed.
BlackCat [7:38 AM]: 17K is badly written. Arr arr arr
CouchLump [8:40 AM]: Fine, I am going to cut and paste all of this into a post. And leave in typos.
… days go by ...
BlackCat [11:23 AM]: So, back to cyclometric complexity. There’s a comment that you can get high complexity numbers if you have huge switch statements. That’s how my code is getting flagged. My USB command line parser is 38. My UART command line parser is 34. The whole rest of the system is under 20.
Either I can game the system and not check return codes (ain’t gonna happen), I need to look at a new structure like calling through function pointers, or just live with a higher than expected complexity number because of the nature of parsing.
BlackCat [11:33 AM]: I’m not a big fan of arbitrary rules like a function having a maximum size of 2 pages of code. What’s a page? Or having a bunch of little functions that really should be in the body of a switch. But I haven’t been kicked by anybody about that yet.
HipCat [11:53 AM]: @BlackCat remember the “in general” comment. There can be exceptions to the rule. How you handle those are up to you and the team. We made a comment above the function saying why it was exempt.
BlackCat [11:56 AM]: I agree. This is between me and me. CC does show the crunchy bits of the code though.
HipCat [11:57 AM]: About your switch statement though: is it “complex” because it has a ton of case statements? Do those have code in them beyond a function call? Can you break out the code from each case to another function? Are you sharing state between states? There are many approaches to reducing complexity but readability should always be a top priority (i.e., don’t do complicated function pointers to reduce some other metric).
BlackCat [11:59 AM]: Good point, I wonder what it would say about a state machine. I had one project where I was able to do a table state machine instead of a case state machine. I spent a great deal of time explaining how the thing worked to EVERYBODY. Whereas a case state machine would have almost been obvious.
I have some ideas of how to reduce my 137 function, but I expect that the customer will say “could you just add one more thing?” AAAARRRRGGG!
Otter [12:21 PM]: Function Complexity vs. stack depth is a thing
BlackCat [12:23 PM]: how about
if (SomeCall() != Success) { printf("an error message"); }
versus
assert(SomeCall() == Success);
I suppose in the second case you get the file and line number, so you’re not searching around for some text. (assuming that this doesn’t go through the error logger)
HipCat [12:25 PM]: No code that has to be called in asserts.
Yep, @Otter , but write for readability and simplicity first. And document in the code why you had to go the more complex route.
Well, I’m sure it will go on and there will be more but HipCat leaves us on a good note: write for readability and simplicity first and document why you had to add complexity.