Another week, another workflow. This week we’ll discuss how to find date/time information of web content even if it’s not obvious. This will help you establish a timeline of content or determine if an article has been altered since the original publication. These steps are not in chronological order. You can use any one of these techniques at any point in your investigation and may find one more reliable than the other; however, it's important to know the menu of options available to you when the date/timestamp of a piece of content is important to your workflow.
Step 1: Check the URL
This step might seem obvious, but it is often overlooked. A lot of content, particularly blog posts, have the date within the URL of the post. Because they're in the URL, you don't even have to visit the page to see when it was published. If you're collecting information at scale, this could be useful for chronologically sorting information without having to scan the page.
Here's an example from an OSINT blog hosted on Wordpress:
Notice the 2017/02/25 within the URL.
Step 2: Check the Sitemap
Checking the sitemap of a website can be a quick way to get a chronological footprint of all published content. While this won't give you the original date/time stamp of the post, it'll usually show you the last modified date and time. This is great for discovering new content within a site as well as monitoring change over time.
Simply add /sitemap.xml to the end of any website to check if a sitemap exists for the page. Some websites may have a custom URL for the sitemap rather than the default.
Above is a screenshot for the sitemap on https://jakecreps.com/sitemap.xml.
Step 3: Check the Source Code
If you can't find the date within the URL and the sitemap isn't visible, check the source code. Typically, publications will store the publication or modified date within the head tag of a page.
Above is a screenshot for my blog. Notice the published_time and modified_time of the article. You'll also find a variety of other useful information including images and other metadata.
Step 4: Check Google
Google will regularly index web pages, especially if they've been modified. If you copy the URL of interest and paste it after inurl: in Google, the date Google last crawled the page will appear. Sometimes, this will be the original date. This is a good way to test across multiple domains if your investigation is broader.
Above is one of my OSINT Workflow Wednesday from last week. As you can see, Google indexed it "7 days ago" which is when it was published. However, before jumping to conclusions on original publication date, make sure to follow the previous and upcoming steps to confirm.
Step 5: Check Social Media
If you are unable to find date and time so far, there's still more you can do. If you copy and paste the URL into Twitter, for example, and check the oldest tweet, you'll get an estimated publication date.
Above is a screenshot showing March 3, 2021 which is when that article was published. If you have no success with the URL, try the article title instead; alternatively, you can search for the author on social media and see when they shared it.
Step 6: Check the Images
Since you've already checked the source code, you'll be familiar with this next step. Images are often hosted with the upload date by default. Inspecting the image source of any image will give you a general idea of when it was uploaded and therefore, when the article was published.
The above snippet shows the image url https://jakecreps.com/content/images/size/w960/2021/03/workspace-02-01.png. This lets us know that the image was uploaded on 03/2021. Not an exact date/time, but a general idea.
Step 7: Check the Comments
If the page you're on is a news article, a blog post, or similar, it might have comments enabled. If you go to the first comment, check the time stamp of the comment to get a general idea of when the page was published. If the comment doesn't have a visible date/time stamp, make sure to check the source code as well.
Above is a comment from nixintel's blog. As you can see, the first comment is from February 8, 2021 at 3:46 am (my time). The article was published on February 7. 2021. Once again, not an exact date but close enough to establish a timeline.
Step 8: Check the Archives
Using a tool like Web Archives, you can quickly check to see if the page you’re looking at has been archived. If you check for the earliest archive date, you can use it to determine a date range or find discrepancies.
Another valuable tool that's web based is called "Carbon Dating the Web" by Old Dominion University. This tool will check multiple archives at once and print out the date/time stamp. It, too, is open source.
Thanks for reading. If you enjoyed this post, make sure to subscribe. A new one just like this will be posted every Wednesday at 6:00 PM UTC-5:00.