I'm running into the strangest situation, in which a site (http://ift.tt/1t8MUSO) erroneously returns a 404 response code.
I'm writing an automatic test suite which detects urls with errors. This site works in the browser, although it does return a 404 in the network monitor.
Is there any technical way to validate this page as not an error page, while correctly detecting other pages (e.g. http://ift.tt/15nLloM) as 404s?
# -*- coding: utf-8 -*-
import traceback
import urllib2
import httplib
url = 'http://ift.tt/1t8MUSO'
HEADERS = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36',
#'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
'Accept-Encoding': 'gzip, deflate',
'Accept-Language': 'en-US,en;q=0.8',
'Connection': 'keep-alive'}
request = urllib2.Request(url, headers=HEADERS)
try:
response = urllib2.urlopen(request)
response_header = response.info()
print "Success: %s - %s"%(response.code, response_header)
except urllib2.HTTPError, e:
print 'urllib2.HTTPError %s - %s'%(e.code, e.headers)
except urllib2.URLError, e:
print "Unknown URLError: %s"%(e.reason)
except httplib.BadStatusLine as e:
print "Bad Status Error. (Presumably, the server closed the connection before sending a valid response)"
except Exception:
print "Unkown Exception: %s"%(traceback.format_exc())
When run, this script returns:
urllib2.HTTPError 404 - Server: nginx
Content-Type: text/html; charset=utf-8
X-Drupal-Cache: HIT
Etag: "1422054308-1"
Content-Language: en
Link: </node/1523879>; rel="shortlink",</404>; rel="canonical",</node/1523879>; rel="shortlink",</404>; rel="canonical"
X-Generator: Drupal 7 (http://drupal.org)
Cache-Control: public, max-age=21600
Last-Modified: Fri, 23 Jan 2015 23:05:08 +0000
Expires: Sun, 19 Nov 1978 05:00:00 GMT
Vary: Cookie,Accept-Encoding
Content-Encoding: gzip
X-Request-ID: v-82b55230-a357-11e4-94fe-1231380988d9
X-AH-Environment: prod
Content-Length: 11441
Accept-Ranges: bytes
Date: Fri, 23 Jan 2015 23:28:17 GMT
X-Varnish: 2729940224
Age: 0
Via: 1.1 varnish
Connection: close
X-Cache: MISS
Aucun commentaire:
Enregistrer un commentaire