Google now uses BERT to match stories with fact checks
Google has made numerous changes to auto-complete, News, fact-checking, knowledge panels, breaking news detection and more.
At a high level, Google is aiming to provide search results that are not just as relevant as possible, but also as reliable as possible. And in some cases, like in the YMYL, your money your life sector, reliability is of a higher concern to Google, especially with the U.S. presidential election around the corner and health being of top concern during the pandemic.
What has changed? Pandu Nayak, Google Fellow and Vice President of Google Search, said today’s announcement was more about the ongoing changes Google has made over the years rather than a new product or feature launching. Here are recent tweaks and changes the company highlighted from the past year:
- Auto-complete policy change around elections, specifically being more conservative and showing less vs more in this area.
- Google BERT being used in full coverage news stories to better match fact checks with stories.
- Fact check labels has been shown over 4 billion times in 2020.
- Google works closer with Wikipedia to detect and remove vandalism that Google may use in knowledge panels.
- Google is now able to detect breaking news queries in a few minutes versus 40+ minutes.
BERT and full coverage. Google is now leveraging BERT, one of its language AI models, to better understand if stories in the Google News full coverage area are reliable in terms of the facts on the web. So Google in a sense can see the connections between the articles and fact check database to better match fact checks with stories. In a sense help with understanding if the fact check is related to the main topic of stories.
Pandu wrote, “we also just launched an update using our BERT language understanding models to improve the matching between news stories and available fact checks. These systems can better understand whether a fact check claim is related to the central topic of a story, and surface those fact checks more prominently in Full Coverage — a News feature that provides a complete picture of how a story is reported from a variety of sources. With just a tap, Full Coverage lets you see top headlines from different sources, videos, local news reports, FAQs, social commentary, and a timeline for stories that have played out over time.”
Breaking news. Google said it can now detect breaking news queries within a few minutes of the news breaking. While in the past, it could take Google over 40 minutes. These types of breaking news queries in the past would sometimes surface inaccurate information. So now Google can detect breaking news queries much faster and thus turn up the lever on what types of sites it wants to show for those queries early on – in this case, more authoritative results that match on the E-A-T.
You should expect more accurate and reliable information from Google around breaking news topics.
Auto-complete policy changes. David Graff, Senior Director, Trust & Safety at Google, said around the elections specifically but also around some other areas, Google is going to take a more conservative approach with what suggestions it shows in auto-complete. Google rather not show a suggestion in auto-complete than show an inaccurate suggestion. So around elections and some other areas, Google may show less suggestions than more.
Pandu explained that Google has “expanded Autocomplete policies related to elections, and we will remove predictions that could be interpreted as claims for or against any candidate or political party.” “We will also remove predictions that could be interpreted as a claim about participation in the election—like statements about voting methods, requirements, or the status of voting locations—or the integrity or legitimacy of electoral processes, such as the security of the election,” he explained. One example given by David Graff was that a query like [you can vote by mail by texas], if that is true or not, might not be shown. One important note is that “whether or not a prediction appears, you can still search for whatever you’d like and find results,” David explained.
Fact check label shown 4 billion times. Google said that so far in 2020, the fact check label has been shown over 4 billion times in search. Google said this is already more than how many times Google showed this fact-check label in all of 2019. Google has expanded the fact check label across news, search, images and other areas over the past few years.
Knowledge graph and Wikipedia. Google has been investing a lot along side Wikipedia to detect and reduce vandalism within Wikipedia. Since Google sources Wikipedia very often for its knowledge panels and featured snippets, Google has an incentive to ensure those Wikipedia entries are reliable and accurate. Most issues are corrected within Wikipedia within minutes, Google said.
“To complement Wikipedia’s systems, we’ve added additional protections and detection systems to prevent potentially inaccurate information from appearing in knowledge panels. On rare occasions, instances of vandalism on Wikipedia can slip through. Only a small proportion of edits from Wikipedia are potential vandalism, and we’ve improved our systems to now detect 99 percent of those cases. If these issues do appear, we have policies that allow us to take action quickly to address them. To further support the Wikipedia community, we created the WikiLoop program last year that hosts several editor tools focused on content quality. This includes WikiLoop DoubleCheck, one of a number tools Wikipedia editors and users can use to track changes on a page and flag potential issues. We contribute data from our own detection systems, which members of the community can use to uncover new insights,” Google wrote.
Search quality raters and guidelines. Google explained that a lot of the benchmarks and criteria is listed in its Search Quality Raters Guidelines. Google clearly documents in there their goals for the search results, which types of queries require a higher level of authoritative and reliable sources, and which do not. In fact, Pandu explained that Google trains its query classifiers to understand if a query pertains to the YMYL category (we know this). So when new issues arise, Google’s systems are ready to handle those queries he said.
To understand what is reliable, particularly in areas of health and elections. Google had to define what reliable and high quality directly their search quality raters guidelines. Google uses the feedback from the quality raters guidelines and feeds this into machine learning models to send this feedback back to their engineers to improve search overall. Again, Google does not use these raters directly in search and these rating to not directly influence the search rankings of individual queries or sites.
Google has more than 10,000 raters around the world, including one in every single state in order to get a representative view of all searchers. These raters rate the search results based on those quality raters guidelines. Raters look at side by side experiments, Google did over 60,000 side-by-side experiments and almost 400,000 search quality tests, with over 1M over the past four years – that is about 1,000 tests per day.
Why we care. Google is constantly tweaking Google Search to improve search quality, relevancy, reliability and accuracy. That means that you need to constantly improve your web site to ensure you have the highest quality, more relevant, more reliable and accurate content and user experience.