Let’s say that your next idea—which could be the next big idea—involves a web-based collection, compilation, or some presentation of a sliver of “big data” so pioneering, maybe even disruptive, that customers and investors will come chomping at the bit to get their hands on it. Your idea, undoubtedly, has an e-commerce angle, such as a proprietary feature complete with pricing information indexed for your customers’ convenience. A meaningful portion of your solution’s value will likely stem from this carefully selected catalog of prices. So, how do you protect it?

There are several mechanisms of protection at your disposable—some technical and others legal, for example. Determining the specific type and degree of security measures that you will deploy to defend against the myriad of potential threats is a business decision, which must be made early and revisited often.  However, one modern technical phenomenon, data scraping, presents a particularly tricky business dilemma warranting a deeper level of analysis.

Data scraping is a technical means for gathering and copying targeted data from a website or other database. In general, data scraping can consist of using computer programs to process a website’s human-readable content or HTML instead of relying on a website’s API, which typically requires prior authorization from the website owner or operator. Data scraping is the primary means deployed by internet search engines and web aggregators to catalog and organize the immense dataset found online. There is often a great mutual benefit in the relationship between web aggregators and the targeted web sites. Aggregators need content to function, and website owners, presumably, want web users to see their content; however, it is not always clear what the law proscribes when the owner of a website desires to shield its content from the data-scraping technology employed by search engines and others.

As the business and legal worlds continue to grapple with the full scope and latitude of data scraping, it is important to consider its implications from both perspectives. From the legal perspective, lawmakers, lawyers, and judges are continually working to conceptualize data scraping in a manner that fits the existing legal landscape. The ever-changing nature of the internet, and all aspects associated therewith, including data scraping, makes it difficult for the law to keep pace; especially since the law is notorious for lagging behind the pace of developmental progress for contemporary technologies. So, as the internet continues to evolve, dislocating one industry after another, there is a remarkable irony at the heart of it all—the legal ambiguity of data scraping.

As we have discussed before, one legal regime currently wrestling with the concept of data scraping is the Computer Fraud and Abuse Act (“CFAA”), which proscribes unauthorized access to a protected computer in the form of either (i) access without authorization, or (ii) use that exceeds authorized access, such authorization to access being either expressed or implied. Under the CFAA, courts have held that certain uses of data-scraping software to extract information from or gain access to a database or website are actionable, and companies that employ data-scraping and data-crawling software should ignore the CFAA at their own peril.

An important distinction under the CFAA that is emerging from the case law is whether or not the targeted data is publicly available or, instead, private and protected—for example, presented behind a paywall, requiring user credentials, and so on. Development in several recent and ongoing cases suggest that courts—at least lower federal courts—are amenable to a constrained interpretation of the CFAA as it applied to publicly available information.

Intuitively, making data publicly available on the internet, at the least, suggests that the owner or operator of the website is willing to grant some form of an implied permission to all web users so that they can access that information. Nonetheless, some website owners have sought to use the CFAA to block unwanted data scraping by arguing that access to publicly available data can be “unauthorized” once the owner or operator revokes any implied permission or authorization through a cease and desist letter or the like.  Interestingly, there is currently a split regarding whether or not a website owner can revoke implied authorization to access to publicly available information, and a case on appeal hopes to answer this question.

Given the unsettled nature of this area of the CFAA, it is possible that certain forms of data scraping related to publicly available information could trigger criminal and civil penalties under the CFAA, especially if such activities catch the ire of the website owner or operator. As the courts seek to flesh out this area of the law, we will be watching closely for any developments. For more information on this topic, please contact Kris Kappel or Liam Reilly.