Why Code Snippets From Stack Overflow Can Break Your Project
You’ll be surprised how many of the most common solutions contain security vulnerabilities
Cover Photo by Jacob Mejicanos on Unsplash
Stack Overflow has saved many programmers, including myself. Some of us have never visited the home page of Stack Overflow. We found the site by conducting a Google search for our problem or bug.
This habit is convenient, but we might be unwittingly using code that contains some terrible bugs or security flaws. Even if it’s universally understood that copy-pasting code from StackOverflow is a bad idea, developers still do it.
Copying code itself isn’t always a bad thing. Code reuse can promote efficiency in software development; why solve a problem that has already been solved well? But when the developers use example code without trying to understand the implications of it, that’s when problems can arise. — Ryan Donovan
You might think that this security scare is an urban myth. I can assure you that it is not.
I recently came across several incidents that caught my attention and I will speak briefly about each of them.
Illustration by Mateusz Kupilas
The most copied StackOverflow Java snippet of all time is flawed!
I first came across this reading a blog. This particular blog article was written by the author of the code himself. You can read it here.
The author of the code, Andreas Lundblad, a Java developer at Palantir, and one of the highest-ranked contributors to StackOverflow, admitted to the flaw.
A research paper in 2018, by Sebastian Baltes, published in the journal “Empirical Software Engineering”, identified a code snippet Andreas posted on the site as the most copied Java code taken from StackOverflow and then re-used in open source projects. It has been copied and embedded in more than 6000 Github Java projects.
This code snippet was provided as an answer to this question posted on Stack Overflow in 2010. Its function is to convert byte counts into a more human-readable format. For example, it converts 1024 bytes into 1kB or 104,8576 bytes into 1 MB.
Having been informed by Sebastian Baltes about the remarkable spread of his code snippet, Andreas revisited the code and published a corrected version on his blog.
At the end of his article, Andreas laid out some valuable advice for developers:
Stack Overflow snippets can be buggy, even if they have thousands of upvotes.
Test all edge cases, especially for code copied from Stack Overflow.
Do include proper attribution when copying code. Someone might just call you out on it.
Although this bug was a trivial edge case and would only cause minor inaccurate estimations in file size, things could have been much worse. Let's see some more examples.
Major Security Flaws in the Most Popular C++ Code Snippets on StackOverflow
A research paper published in 2019 by Morteza Verdi et al, found that 69 of the most popular C++ code snippets posted on StackOverflow in the past ten years contain major security flaws.
The 69 identified vulnerable code snippets are used in 2589 GitHub projects. The most common vulnerability propagated from Stack Overflow to GitHub, according to researchers, is CWE150:
CWE is a community-developed list of common software and hardware security weaknesses. It serves as a common language, a measuring stick for security tools, and as a baseline for weakness identification, mitigation, and prevention efforts.
CWE150 is where improper neutralization of space, meta, or control space takes place.
The researchers of the paper have developed a chrome extension that alerts developers whether the Stack Overflow code snippet they are viewing has any security vulnerabilities. Although this extension isn’t available for general use, you can check its source code out here.
You Cannot Run Docker for Windows and Razer Synapse Driver Management Tool at the Same Time Because They Contain a Stack Overflow Bug
Two years ago, there was a strange issue with Docker for windows. People were unable to get Docker started on their windows computers. This strange issue was opened by a user in Github and several other users said they also had faced it. No one knew what was actually wrong until this Reddit post popped up.
It was identified that the problem occurred when you tried to run Docker for windows while Razor Synapse is running in the background. If you have Razer Synapse running, Docker thinks that there’s already an instance running and it won’t start.
Both applications want only one instance of themselves running. Although this seems to be a legitimate requirement, the implementation seemed to be the root cause of the bug. The buggy code that caused the problem:
var name = string.Format("Global\{0}", (object) Assembly.GetExecutingAssembly().GetType().GUID);
The problem is that the GUID returned is the GUID for the type System.Reflection.RuntimeAssembly and not a GUID for a type defined in the Docker for Windows assembly.
Something interesting
If only one of the applications had used the above incorrect code snippet, there would not have been an issue in the first place. But, as it turns out, both applications used the incorrect code snippet, thereby not letting both instances run at the same time.
But where do you think those two applications got their code snippet from?
You’ve already guessed it — Stack Overflow.
Here’s the flawed Stack Overflow post that both applications got their code snippet from.
You won’t see the flawed answer if you visit the page now, as they’ve edited the answer. If you want to see it for yourself, visit an old archive of the page using the Wayback Machine.
Screenshot by Кекек Мачан
Key takeaways for developers by Foone Turing:
Think about how you would find this bug in your own programs.
You copy and paste the code and it seems to work. What you don’t realize is it’s broken — because you don’t run either of these programs which made the same mistake.
Should I Avoid Copying?
Not really
Stack Overflow is totally essential for any developer today. But most of the issues found in projects are basic security errors. If you understand what you are copying, there’s no harm in using it. But for the code to be production-ready, there should be adequate tests, especially for edge cases.
“If you borrow things and you don’t understand the content of what you’re borrowing, then you fall in this trap of reusing code that has potential vulnerabilities. Then you are just spreading those things around.”
If you’re going to reuse code, you need to understand that code.
**Ryan Donovan**
Happy coding!
Resources
World’s Most Copied SO Snippet by Andreas Lundblad.
The Most Copied Stackoverflow Java Code Snippet Contains a Bug by Catalin Cimpanu.
Copying code from Stack Overflow? You might paste security vulnerabilities, too by Stack Overflow Blog.
Usage and attribution of Stack Overflow code snippets in GitHub projects by Sebastian Baltes.
An Empirical Study of C++ Vulnerabilities in Crowd-Sourced Code Examples by Morteza Verdi et al.