• How web scraping can be a valuable data source

    How web scraping can be a valuable data source

    Web scraping. It sounds like hard work, but it is more clever than arduous. The technique exploits a simple truth: The front end of the web site, which you see, must talk to the back end to extract data, and display it. A web crawler or bot can gather this information. Further work can organize the data for analysis. Digital marketers are forever seeking data to get a better sense of consumer preference and market trends. Web scraping is yet one more tool towards that end. First crawl, then scrape “In general, all web scraping programs accomplish the same two tasks: 1) loading data and 2) parsing data. Depending on the site, the first or second part can be more difficult or complex.” explained Ed Mclaughlin, partner at Marquee Data, a web scraping services firm. Web scraping bears some resemblance to an earlier technique: web crawling. Back in the 1990s, when the internet occupied less cyber space, web crawling bots compiled lists of web sites. The technique is still used by Google to scrape for key words to power its search engine, noted Himanshu Dhameliya, sales director at process automation and web scraping company Rentech Digital. For Rentech, web scraping is just obtaining “structured data from a mix of different sources,” Dhameliya said. “We scrape news web sites, financial data, and location reports.” “Web scraping data is collected on a smaller scale,” said George Tskaroveli, project manager at web scrapers Datamam, “still amounting to millions of data points, but also collecting on a daily or more frequent basis,” he said. “The defining features of modern web scraping are headless browsers, residential proxies, and the use of scalable cloud platforms,” said Ondra Urban, COO at scraping and data extraction firm Apify. “With a headless browser, you can create scrapers that behave exactly like humans, open any website and extract any data… [M]odern cloud platforms like AWS, GCP, or Apify allow you to instantly start hundreds or thousands of scrapers, based on the current demand for data.” Which party data?  And how to get it There is a spectrum of data gathering, ranging from zero-party to third-party data, that marketers are forever picking through for the next insight. So where does web scraping fit into this continuum? “Web scraped data is most closely related to third-party data.” Said Mclaughlin, as marketers can then join this data with existing data sets. “Web scraping can also provide a unique data source that’s not heavily used by competitors as may be the case with purchased lists.” He said. “Ninety-five percent of the work we do is third-party [data],” said Dhameliya. Scraping aims for the data trafficked between the front-end and back-end of the web site. That may require an API crafted to tap this data stream, or using JavaScript with a Selenium driver, he explained. Most of Rentech’s work is for enterprises seeking marketing intelligence and analysis. Bots are tasked with periodic visits of web sites, sometimes seeking product information, Dharmeliya said. Some web sites limit the number of queries coming from a single source. To get around that, Rentech will use AWS Lambda to execute a bot that will launch queries from multiple machines to get around query limitations, Dhameliya explained. It is not humanly possible to go through all the data to weed out “nulls and dupes,” Tskaroveli said. “Many clients collect data with their own devices or use free-lancers. It’s a huge problem, not receiving clean data,” he said. Datamam relies on its own in-build algorithms to go through the “rows and columns”, automating quality assurance. “We write custom python scripts to scrape websites. Usually, each one is customized to handle a specific website, and we can provide custom inputs, if needed,” said McLaughlin. “We do not use any AI or machine learning to automate the production of these scripts, but that technology could be used in the future.”  Any data that can be manually copied and pasted can be automatically scraped.” Mclauglin added. “[I]f you find a website with a directory of a list of potential leads, web scraping can be used to easily convert that website into a spreadsheet of leads that can then be used for downstream marketing processes.” “Social media are a different beast. Their web and mobile applications are extremely complex, with hundreds of APIs and dynamic structures, and they also change very often thanks to regular updates and A/B tests,” Ondra said. “[U]nless you can train and support a large in-house team, the best way to do it is to buy it as a service from experienced developers.” “If [the client] is in e-commerce, you might get away with an AI-powered product scraper. You risk a lower quality of data, but you can easily deploy it over hundreds or thousands of websites,” Ondra added. Scrape the web, but use some common sense There are limits — and opportunities — that come with web scraping. Just be aware that privacy considerations must temper the query. Web scraping is a selective, not a collective, drag net. Data privacy is one of those limits. “Never collect the opinions or political views or information about families, or personal data,” said Dharmeliya. Evaluate the legal risk before scraping. Do not collect any data that is legally risky. It’s important to understand that web scraping isn’t — and for legal reasons shouldn’t be — about collecting personal identifiable information. Indeed, web scraping of any data has been controversial, but has largely survived legal scrutiny, not least because it’s hard to draw a legal distinction between web browsers and web scrapers, both of which request data from websites and do things with it. This has been litigated recently. Facebook, Instagram and LinkedIn do have rules governing which data can be scraped and which data is off-limits, Dharmeliya said. For example, individual Facebook and Instagram accounts that are closed are private accounts. Anything that feeds data to the public world is fair game — New York Times, Twitter, any space where users can post commentary or reviews, he added. “We don’t provide legal advice, so we encourage our clients to seek counsel on legal considerations in their jurisdiction.” McLaughlin said. Dig deeper: Why marketers should care about consumer privacy Web scraping is still a useful adjunct with other forms of data gathering. For Datamam clients, web scraping is a form of lead generation, Tskaroveli said. It can generate new leads from multiple sources or can be used for data enrichment to allow marketers to gain a beter understanding of their clients, he noted. Another target for web-scraping bots is influencer marketing campaigns, noted Dhameliya. Here the goal is identifying influencers who fit the marketer’s profile. “Start slow and add data sources incrementally. Even with our enterprise customers, we’re seeing huge enthusiasm to start with web scraping, as if it were some magic bullet, only to discontinue a portion of the scrapers later because they realize they never needed the data,” Ondra said. “Start monitoring one competitor, and if it works for you, add a second one. Or start with influencers on Instagram and add TikTok later in the process. Treat the web scraped data diligently, like any other data source, and it will give you a competitive edge for sure.”

    Know More

  • 4 tips for navigating sensitive customer data

    4 tips for navigating sensitive customer data

    Consumer data collection has exploded over the past decade. As users, we’ve grown too accustomed to sharing very personal data in this loosely regulated digital age through every topic searched, email sent and double-tap on a friend’s post. All these signals build a rich profile for targeting and personalization. Data-driven marketing has had a transformational shift not only in how we engage with our customers but, even more importantly, in how we target new prospective customers. But for many, this new era of ultra-sophisticated audience-based targeting is begging more questions than the martech industry can answer. Most pointedly, is today’s reliance on data-driven targeting becoming a surveillance state?  This recent backlash led to California’s Consumer Privacy Act (CCPA) which went into effect in 2018. More states have since followed, giving them more control over what personal data can be collected, brokered and used for marketing.  Dig deeper: Why marketers should care about consumer privacy Sensible vs. sensitive data targeting As marketers, it’s more imperative than ever to respect a person’s privacy and still utilize all the available data responsibly to create personal ad experiences. With little overall regulation, all types of data are at our fingertips to build cross-channel campaigns that can feel tailor-made for the user. It’s a fine line, though, on which ads will be met with delight and which ads will feel intrusive and even offensive.  As users of all this tech, we know all too well when marketers overstep. That line depends heavily on what’s being sold and how personal the marketer makes the ad experience. A gut check on your data strategies can quickly unveil how personal or behavioral data may inadvertently target a minority or potentially stigmatized group.  Suffice it to say, if you’re selling pet food, you can likely create some hyper-targeted and personalized ads without tripping the sensitivity trigger. On the other hand, if you’re targeting people with ailments, new or prospective moms or even plus-sized clothing buyers, it’s critical to take a close look at: What data is being used.How those audiences are modeled.How you’re differentiating your messaging to existing customers versus prospective buyers.  Since it’s never a cut-and-dry answer, here are four suggestions for navigating sensitive data. 1. Steer clear of potentially stigmatizing data Ad targeting prospective customers based on ailment data, LGBTQ+ or racial background can put us in an all too obvious danger zone. However, it’s just as crucial to be aware of targeting audiences that could be stigmatizing or just too personal. Some more obvious examples of these audiences could include religion, political affiliation, mental health, military status or even data that reveal personal or financial hardship.  Martech platforms have removed the most sensitive audiences over the past few years. Yet, many ad targeting platforms still contain this data in less conspicuous derivations. For instance, you can no longer target by race in Meta’s properties but can still target BET Awards viewers.  One way to avoid crossing the line from sensible to sensitive targeting is to review the audiences Meta has removed over the past few years and see if any of your data strategies could touch a sensitivity nerve for your customer or prospect. 2. Data usage for customer vs. prospect targeting Collecting data on your customers open all sorts of innovative and clever insights that can be used for targeting. With that comes the responsibility to use personal data carefully when building audiences and personalized recommendations.  They may be your customer, but be cognizant that some data-driven recommendations can be interpreted in a way that may make your customers uncomfortable or even find offensive.  A big-box retailer learned this the hard way when they relied too heavily on programmatically generated ads and inadvertently served personalized ads for weight-loss products to plus-size apparel buyers. No surprise that the backlash was swift. Be aware of how you use data across the customer journey to avoid inadvertently putting consumers on the back foot. For prospect targeting, it’s even more critical to be judicious about how personally identifiable data is used. A good rule is to stay close to demographic and publicly available audience data.  As in life, it’s true in advertising that brands get one chance to make a good first impression. An overly personal ad with a new prospect can feel like a stranger asking or assuming more about the user than they are prepared to share. Overstepping with new prospects will not only result in lower ad engagement but can quickly trigger a negative brand bias that will be a long road to winning that trust back.

    Know More

  • 4 common problems marketers and data analysts can solve together

    4 common problems marketers and data analysts can solve together

    Marketers and data analysts see the world in very different ways. Because they are often working together on the same project, this can cause a lot of problems. However, the increasingly complex nature of marketing and the growing need for data-driven insights mean they must find places to work together. “Because they come from different worlds, there can be some head-butting and some frustration,” said Steve Petersen, marketing technology manager for subscription management platform Zuora, at The MarTech Conference. Here are four common marketing challenges where marketers and data analysts can help each other . Dig deeper: Marketing analytics: What is it? Media fragmentation and an increasing number of channels The complex media ecosystem is forcing marketers to run campaigns on an ever-increasing number of channels, including many varieties of social, streaming video, retail media networks, email and more. Not only do marketers need to test different advertising on each of those, they need to know how the channels impact each other. That’s where marketers need analysts’ methods and insights. Otherwise it’s impossible to put each effort into context and know how the campaign is performing overall. “Sometimes it’s really hard to isolate one thing and figure out its performance,” said Peterson. “So instead of trying to find out how we can isolate one factor, try to have an educated outlook and work with your data analysts.” Marketers can bridge the divide with analysts by looking at year-to-year comparisons or by measuring campaign performance together. External factors that affect marketing programs It’s not just that marketing functions are using an increasing number of channels. There are also external factors that can impact marketing campaigns and the entire organization to be considered. As we all know, over the last two years the pandemic and other huge events have radically changed consumer behaviors. Marketers may forget or not know how to factor in the impact this has on campaigns. “We run into situations where [marketers] forget to take into account external factors that may have had an impact on their performance KPIs,” said Arti Munshi, senior market research manager for National University. Munshi shared the example of a sporting good company that is impacted by the Olympics. In comparing performance numbers year-over-year, they have to account for what happens in the years the games take place. Also, while the Summer and Winter Games are usually held two years apart, the pandemic pushed them into consecutive years. “With a period when the Olympic Games wasn’t taking place, marketers definitely aren’t going have an apples-to-apples comparison, and will interpret the data incorrectly,” said Munshi. This could lead to false expectations for future non-Olympics years, she added. Adding context to marketing initiatives Marketers and data analysts should be in a constant dialogue about the data that is needed to help power marketing campaigns. They shouldn’t just be searching for the “what” of data insights, but the “why” that drives these initiatives. “I would just say that no amount of information is too much information, from personal experience,” said Munshi. “If we can get to the context of the request, the ‘why’ of the problem that we’re trying to solve right at the start, then there are hours of analysts’ work that you can save, initially, just by clearly defining that problem statement.” If teams are siloed, this will make it harder to come up with the right answers. “Sometimes information doesn’t flow across all the teams evenly,” said Peterson. “And so a marketer might come to ask the analyst a question, and the analyst might provide an answer that may not be satisfactory, but the analyst may not be aware that [the question originally came from] your sales team.” Working through limitations As the marketing landscape continues to transform, there are new limitations that come into play that might not have been relevant a short time ago. For instance, there might be new privacy regulations that guide how an organization can obtain or use data. This means that marketers and data analysts must be on the same page about the data problems they are trying to solve. “We now have limitations on certain data points which we didn’t have previously,” said Munshi. “It makes it challenging for marketers in this cookie-less environment to reach out to their consumers on a multitude of platforms. As a result, analysts are now tasked with trying to build personas or continue to target their customers with the same level of accuracy that they did in the past.”  She added, “This does not mean that it is the end of all of this. We can definitely work together, both the analyst and the marketer, to come to a solution.” Dig deeper: Why marketers should care about consumer privacy

    Know More

  • Why marketers should care about consumer privacy

    Why marketers should care about consumer privacy

    The U.S. is on the cusp of implementing a new national privacy law, the American Data Privacy and Protection Act (ADPPA). And while we may be late to the party (the EU’s General Data Protection Regulation, or GDPR, was implemented in 2018), it’s now high time for businesses to start paying attention to data and how it impacts consumer privacy. The new law will have significant implications for marketers, who will need to ensure they are handling consumer data in a responsible and transparent manner. Consumers, for their part, are more invested in maintaining control of their data and reluctant to exchange personal information (even for incentives) unless they trust that you’re being careful with their data.  Nearly three-quarters of consumers rank data privacy as a top value, a recent report by MAGNA Media Trials and Ketch found. In this post we’ll cover: Estimated reading time: 9 minutes What is privacy in marketing? Privacy in marketing is all about data — specifically, an individual’s personal, identifiable or aggregate data and how companies collect it, use it, share it and forget it. The International Association of Privacy Professionals (IAPP) defines privacy as, “the right to be left alone.” From a data privacy perspective, that means individuals have the right: To understand how their data is being used.To control who has access to it.To tell a company to stop using it.To have it deleted if they want.  Privacy is not an all-or-nothing proposition. There are different levels of sensitivity when it comes to the types of data companies collect. For example, a consumer’s name and email address are not as sensitive as their health data (although with the implementation of ADPPA, that could change.)  Why marketers should care We can’t have all the shiny new marketing things — omnichannel experiences, customer centricity, personalization — without consumer data. But with big data comes big responsibility.  It may have taken consumers and especially U.S. consumers, a long time to become educated about the fact that brands engaged in granular tracking of online behavior, using the data gathered for marketing purposes or even selling it.  Ironically, U.S. brands were forced to take privacy seriously by European legislation (GDPR) because the worldwide reach of the internet meant brands could hardly guarantee to avoid engagement with European data subjects. The new horizon, however, is not just complying with applicable privacy laws — it’s being proactive about consumer privacy to build trust, establish community and secure loyalty. How important is privacy to consumers? Consumers care about privacy a lot, according to the MAGNA and Ketch survey, but this doesn’t mean they’re as focused on privacy compliance laws as, say, the entire digital marketing industry.  90% of survey respondents had never heard of the Virginia Consumer Privacy Data Protection Act (VCDPA). But while people may not be closely following government-imposed privacy regulations — or how businesses comply with them — they’re paying attention to companies who get flagged for poor privacy practices. Even if consumers don’t know the acronyms as well as we do, they’re concerned about how businesses handle their data, with just 5% having no major concerns. Here are some top concerns, according to a recent survey by Tinuiti: More than 50% of consumers agree that there’s no such thing as online privacy.Roughly 40% to 50% (depending on age) of people think their mobile phones are listening to them. 70% of consumers don’t like receiving targeted ads as a trade-off for providing their information.Over 40% of consumers are very worried about criminals gaining access to their data.  While it’s true that consumers are more aware of how companies use their data and have some concerns, they’re still mostly in the dark when it comes to a business’s privacy practices — which makes them suspicious. Nearly 60% of consumers in a recent BCG/Google survey think companies are selling their data even though the reality is that most companies don’t do this.  Marketers need to do a better job of educating consumers about how we use their data and what we do to protect it. We also need to be more transparent about how we use consumer data to personalize experiences. Familiarizing yourself with the types of data you’re collecting — and why — is a good start. The four types of consumer data  Marketers use four types of data – first-, second- and third-party data. More recently, what has become known as “zero-party data” emerged (although it’s actually a subset of first-party data). Here’s an overview of each. Zero-party data The term zero-party data was first coined by Fatemeh Khatibloo, VP principal analyst at Forrester Research. The term “declared data” might be a better descriptor, but Khatibloo placed the concept within the tiered hierarchy of first-, second- and third-party data. Basically, zero-party data is derived from a customer expressing a personal preference, be it the color of an item, clothing or shoe size, quantity, birthday, how they wish to receive information or even page settings. First-party data This is data you collect yourself, usually through your website or app. It includes information like names, email addresses, phone numbers, customer purchase histories, etc. It can also include behavioral, location and customer interaction data (e.g., chatbot transcripts). You own this data. That is, you collected it and you can use it how you see fit within the constraints of your region’s data privacy laws, of course. Second-party data This is data that another company shares with you, usually under the auspices of a partnership or some other type of business relationship. It could be something as simple as an email list that you purchased, or more complicated like activity from apps, purchase history and proprietary research. The data, in this case, is owned by the company that collected it, but you have permission to use it. Third-party data This is data that you collect from sources that are not affiliated with you in any way — think consumer data gathered by website cookies placed on someone’s browser as they surf the web.  Third-party data is used widely by marketers to target and personalize ads. New privacy regulations require companies to get express permission from consumers to collect and use cookies or risk stiff penalties. Companies like Google, Apple and Mozilla are (or soon will be) eliminating support for cookies to avoid these penalties.   The new cookieless future will make it more difficult to target ads and personalize messaging. It’s the direct result of emerging consumer privacy laws like GDPR and CCPA.  Privacy initiatives marketers should know about  Some privacy laws like the EU’s GDPR, Australia’s Consumer Data Right (CDR) law and California’s Consumer Privacy Act (CCPA) have already been passed. A fourth initiative, the U.S.’s ADPPA, is currently still cooking on the legislative stove. It’s been approved for a vote in the U.S. House of Representatives, then it must pass in the Senate. If approved, it will be the first comprehensive national law governing how companies collect and use consumer data in the U.S. Here’s a (very high-level) breakdown of some important consumer privacy initiatives: GDPR: The EU’s General Data Protection Regulation went into effect in 2018 and strengthens consumer privacy rights by, among other things, giving consumers the right to know what personal data is being collected about them, the right to have that data erased and the right to object to its use.CCPA: California’s Consumer Privacy Act, signed into law in 2018, went into effect in January 2020. It gives Californians the right to know why and how businesses collect their data, plus what data is collected. It also gives consumers the right to opt out/withdraw consent and the right to be forgotten (e.g., have their data deleted).CDPA: Virginia’s Consumer Data Protection Act will likely go into effect in 2023. As with CCPA, it gives consumers much more control over how companies obtain and use their data. It also places an emphasis on data security, meaning companies will be required to take reasonable steps to protect consumer data from unauthorized access, destruction, use, modification or disclosure. Here are some differences between CCPA and CDPA marketers should be aware of.ADPPA: The American Data Privacy and Protection Act, projected to pass in 2023, would (among other things) give consumers the right to know more about their data, including how companies collect, use and share it. It gives Americans the right to opt-out of targeted advertising and provides “strong protections” for minors which minimize the collection and sharing of minors’ data. Of note, the ADPPA is the first consumer data privacy and security bill aimed at protecting Americans from what has essentially been unfettered access to and use of their data by U.S. businesses.  It’s focused on reducing “commercial surveillance” and strictly regulates what data can be collected at all. It also limits how data can be used. Businesses absolutely need to understand what’s in this bill, which is why we took a deep dive into the specifics of the ADPPA’s main points, including how it will impact marketers. Privacy-enhancing technologies Businesses can get help from technology when it comes to addressing privacy issues. For example, both brands and publishers can take advantage of data “clean rooms.” Clean rooms are a type of privacy-enhancing technology (PET) that allows data owners to share customer first-party data in a privacy-compliant way. Clean rooms are secure spaces where first-party data from a number of brands can be resolved to the same customer’s profile while that profile remains anonymized.  Closely related is “differential privacy.” This uses a cryptographic algorithm to add statistical noise to the data, enabling patterns in the data to be detected while information about individuals is shielded. There are many other types of PET. What does this mean for marketers? Consumer privacy laws like GDPR, CCPA and ADPPA impose strict rules around what, how and why data is collected. Meanwhile, consumers are becoming more invested in their own data — and how companies use or misuse it. People want more control and transparency. They want to reclaim ownership of their information from the Googles and Amazons of the world.  In addition to knowing the latest privacy regulations, marketers should better understand how consumers feel about data, including what data they’re willing to part with in exchange for incentives like discounts, freebies and convenience.  Two-thirds of respondents in the BCG/Google survey said they like getting ads customized to their interests, but nearly half are worried about sharing their data. Younger generations will give up more data for fewer incentives versus older consumers. And no matter what data you’re collecting, you need to cultivate trust and transparency with processes and technology that comply with data privacy laws and keep consumers informed. All of this requires that marketers create a privacy-first, transparent and resilient approach to data usage and data privacy. Increasingly, it also means you’ll have better control over consumer data collection preferences and usage if you have your own data rather than relying on second- or third-party data for your marketing initiatives. Opinions expressed in this article are those of the guest author and not necessarily MarTech. Staff authors are listed here. Related Stories New on MarTech

    Know More

  • 3 things customers expect from marketers to prove that they’re human

    3 things customers expect from marketers to prove that they’re human

    Marketers are using more automation tools to improve workflow efficiency and scale campaigns. But they have to remember that they’re marketing to humans, and that customers expect a human touch.  There are three main values that customers expect at every touchpoint, according to CEO and founder of B2B marketing consultancy Simple Strat, Ali Schwanke, who spoke at The MarTech Conference. If marketers communicate these values, customers will be reassured that there is a human behind the automated tools. “There’s a reason why this year every company is getting back to strategy and innovation,” said Schwanke. “We’ve got the technology. Now, what the heck do we do with it to make sure we stand out and we don’t come across as a league of robots?” Dig deeper: Artificial intelligence is getting even smarter Customers expect empathy The first value is empathy. Customers want to feel like they’re being understood by a brand and that marketers are listening. “Can you put yourself in my shoes as a customer?” Schwanke asked. “Have you looked at all the emails you’re sending me and read them with a human voice?” She added, “We have to keep this in mind as we’re designing marketing automation and workflows and follow-ups and integrations, in order to better serve the customer.” Yes, marketers have goals to meet. They’re trying to generate leads and conversations. But the real needs are those of the consumer, and they need to feel like they’re not a number. Customers value transparency Customers need to be able to trust the brand they’re communicating with. Trust is the foundation of a relationship with the customer. To build that trust, brands need to be transparent about how they use customer data. And beyond that, they should give customers an idea of next steps in their journey. “If it says ‘sign up now’ and I don’t really know what’s coming next, [I’m wondering if] you are somehow going to find my credit card information again.” said Schwanke. Customers can’t see behind the hood of your marketing automation tools, so transparency about next steps is crucial. “There’s a lot of suspicion out there about how all of that stuff works, so transparency is very important,” Schwanke said. Customers require responsiveness And finally, customers want to hear back from the communications they send to the brand. That’s the ultimate reassurance that shows they’re being listened to by a human. Automation and AI-powered personalization can alienate a customer if that customer asks a question back and doesn’t receive an adequate response. For instance, if a personalized email comes into an inbox with the customer’s name, and that customer can’t respond without getting the email kicked back, then they will think it’s a scam. It’s up to marketers to take any surprises out of the customer relationship by being empathetic, transparent and responsive. “We live in a world where everything ahead of us is somewhat predictable,” said Schwanke. “We live in a very ‘surprise-less’ culture, and so customers have to know what’s ahead. You have to communicate with me (as a customer), and I don’t want to be treated like a number.”

    Know More