Debunking the top five myths surrounding web data collection for businesses

Bright Data CTO Ron Kol

Ron Kol, CTO at Bright Data talks to Business Leader about what the main myths are when it comes to web data collection for companies.

Brands are tired of waiting around for data. It’s why public web data collection, perhaps better known as public web scraping, has become a “go to” strategic move for many companies out there. It helps them stay competitive in a market today that’s more like a domino effect, whereby one shift in the market triggers another, without any oversight into how or why this happened.

It’s why this real-time data is so crucial to those looking to make a profit while crushing the competition. However, this vast and expansive field carries more than a few myths around how it works, why it’s important, and whom it benefits.

Before we go any further, it’s worth noting and remembering that web scraping is a vital real-time commodity, an essential resource the helps boost the success of your organisation. There are those who still believe there are undefined boundaries across the industry, but as it continues to grow and flourish, now’s the time to refute some of the misconceptions that have often been associated with in recent years.

Myth #1: Web data collection (or web scraping) is illegal

Let’s clarify something, public web scraping is not illegal, full stop. As long as the website is publicly accessible and not protected by a paywall / log-in type portal, it is within the boundaries set by legal courts. In fact, a US Federal Court recently came out with a ruling in the hiQ/LinkedIn case that compares cases of public web scraping to window shopping.

Moreover, startups, SMEs, and major enterprises all participate in public web data gathering to observe their competitors’ business decisions and trends, as well as conduct new market research and inquisitive analytics on their own data. The overall intention is to discover new opportunities for innovation and growth and to ensure that an organisation does not miss out on opportunities.

As with all processes, it is vital that businesses follow compliance regulations, and if their public web scraping is outsourced, they must always work with their data collection provider to ensure that all operations are legal and ethical. To avoid any doubt, businesses should work with providers to understand what can and can’t be collected, both from a legal and ethical standpoint.

Since there is still no regulation in this field, it is everyone’s moral responsibility to make sure that the data they gather is ethical and serves the greater good. If not, they must reconsider their strategies. Failure to do so would be unethical and may result in legal violations.

Myth #2: Web scraping harms businesses and limits their ability to compete

Quite the opposite is true here. Public web data collection, or web scraping, provides you with the transparency needed when accessing the Internet. It allows all market players to openly compete by simply providing them with accurate market research information. For example, if Company A wishes to set their own pricing strategy in motion, they obviously need to be aware of the special offers or pricing of one of their main competitors, let’s call them Company B.

In the old days, Company A would send out “mystery shoppers” who would manually take note of Company B’s offerings and pricing and adjust their own accordingly to make them more attractive to consumers. Today, our shopping ecosystem has clearly gone digital, and these “mystery shoppers” have simply shifted into online data collection, which provides companies with the information they need to decide their pricing strategy or special offers. Online data collection ensures that companies can effectively compete and continue to attract their target consumer base.

Businesses benefit from the ability to openly compete, and the consumer communities benefit from better offers, cheaper pricing, and an improved shopping experience. Online data collection drives forward an openly competitive market – and promotes overall information transparency.

Myth #3: Web data collection may be legal but it’s not entirely ethical

Let’s start with the fact that public domain data can be openly accessed. You must ensure that your web data collection provider is committed to accessing public web data only. Public web data discussed here must be treated with the utmost sensitivity, integrity, and professionalism. If done right, which means following international regulations and clear and well-established ethical guidelines to preserve users’ data privacy, then you are ensuring that you are legal and ethical.

Public web scraping simply provides you with the same Internet transparency that an average user enjoys. There are obvious risks and critical requirements you must address to confirm that you are conducting your data gathering in an ethical manner. These requirements are not optional or a “nice to have” addition to your company policy; they are a critical necessity that all operators must abide by – without exception.

Myth #4: Most data sources are considered private

This is incorrect as the vast majority of web-based data is public. Internet growth statistics from Statista show that 4.66 billion people are using the internet (as of January 2021). That’s close to 60% of the world’s population. Considering that most of the world’s data has been generated within the last two years alone, it is estimated that close to 70% of the data being generated is public (out of which, humans are responsible for close to 60% of that generated data). Although these statistics only give us a rough indication, the trend is clear to see.

When it comes to web scraping providers can only gather information that is open to the public. To further simplify this, that means anything that you or I could access using a standard browser on the Internet without logging in. The data is off bounds if you have to log in, simple.

Myth #5: Only ‘shady’ companies engage in online data collection

Wrong! Companies of all sizes, from Fortune 500 firms to startups and SMEs, gather and utilise public web data to inform their decision making. The only difference is in the type of data they require and how frequently they need it. In today’s real-time economy, companies can’t thrive without being able to see the full market reality, and to do that, they need access to the largest data source. When our reality is mostly led by digital innovation, it is no surprise that public web data has become the “no brainer” solution.

As the CTO of the market leader in the data collection domain, you might think it is a given that I am fighting for this corner. However, for this industry to succeed, we must be our own harshest critics and ensure that we and others looking to collect data aren’t tempted to engage in illegal or unethical activities in lieu of strict regulations.

With any emerging technology, especially within the data space, there is always going to be analysis that explores its purpose and legalities. However, there is a cause for the greater good, allowing businesses to prosper from the latest, publicly available online insights. When analysing data collection, it’s important to understand what is being collected and how it is being collected.

With so many leading brands dependent on data insights, this will become a fast-growth industry, and it’s up to everyone in this community to promote legal and ethical compliance, if anything, it’s our moral duty to do so.

enewsletter