[Written in the romantic moment of Qixi Lang] The solution for obtaining data when encountering http codes 206 and 302 in Go

FeelTouch Labs 2022-08-06 18:29:03 阅读数:634

writtenromanticmomentqixilang

七夕来袭!Clearly a day to be happy and celebrated,But as programmers we want to play wildly、let go,The premise is to do the job at hand well

所以,This article starts with solving two problems that programmers are likely to encounter,Show the problem、思路分析、解决方案,So that the programmer can get the job done quickly,Then go both ways with your other half without any distractions,嘎嘎.

高能提醒:The following content still needs to be released first️收回来,Focus on analysis from a technical perspective

http clientGet server data returnedcode 206解决方案分析

http code 206含义是什么

一句话,http code 206The identity request was successfully received by the server、理解、并接受,The server only successfully processed or returned part of the data(Partial Content).

为什么会出现http code 206

第一种情况

The client request header sends the identifier of the part of the request,且服务端支持Range数据.
The client states that it only needs the targetURLsome of the resources above,可以带header里面指定Rangefield to indicate which part of the data to get,如下面的例子:
req
request in the example aboveHeader RangeIndicated to obtain0-数据,即从0All data to start with.

Range头域可以请求实体的一个或者多个子范围,Range的值为0表示第一个字节,也就是RangeCounting bytes is from0开始的:
表示头500个字节:Range: bytes=0-499
表示第二个500字节:Range: bytes=500-999
表示最后500个字节:Range: bytes=-500
表示500字节以后的范围:Range: bytes=500-
第一个和最后一个字节:Range: bytes=0-0,-1
Several ranges can also be specified at the same time:Range: bytes=500-600,601-999

Response in the above exampleHeader Content-RangeIndicated to return2027665byte中的0-1048575的部分,That is to say, there is still a part behind that has not been returned,Requires the client to make a second request,并将Range写成1048576-2027665All data can be finally obtained.由此可以看出206 codeCan be very good to achieve breakpoint resume.

The Content-Range response HTTP header indicates where in a full body message a partial message belongs.

第二种情况

This situation is more likely to be encountered when crawling other people's websites,Their server does not necessarily support it206 codeBut when it is found that you are an external illegal call, only part of the data is returned to you in business logic(For example done should return10条数据,But only give you back5条),And the response code is deliberately set to 206.例如下面的例子:

* ~ curl -I https://dappradar.com/v2/api/dapps?params=UkdGd2NGSmhaR0Z5Y0dGblpUMHhKbk5uY205MWNEMXRZWGdtWTNWeWNtVnVZM2s5VlZORUptWmxZWFIxY21Wa1BURW1jbUZ1WjJVOWJXOXVkR2dtYzI5eWREMTFjMlZ5Sm05eVpHVnlQV1JsYzJNbWJHbHRhWFE5TWpZPQ==
HTTP/2 206
date: Thu, 04 Aug 2022 09:36:28 GMT
content-type: application/json
cache-control: private, must-revalidate
cache-control: no-cache
pragma: no-cache
expires: -1
x-cache-status: EXPIRED
x-cache-status: EXPIRED
x-frame-options: SAMEORIGIN
cf-cache-status: DYNAMIC
expect-ct: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
server: cloudflare
cf-ray: 73564f0f483a3c34-HKG

或者IDE调试
206
It can be seen that the above command returns though206 code,It didn't appear at the timeContent-Range字段.Mainly because this request is an illegal request,The server-side force will be what it would have been200的code设置为了206

如何解决http code 206The problem of returning only part of the content

Solution to the first case

From the above principle has been seen more clearly,就是直接按照byteThe location of the request can be sent one by one,At this time, the server also returns the corresponding responsebyte返回的数据.The attention here is not to exceed the maximum supported by the server each timebytes.

第二种情况的解决

The solution for this situation is,After modifying your own request into a legitimate request, you can directly return all the data after calling it again,且返回code变成了200. For example, for the above crawler problem,After trying it was found that it was because of the requestHeader缺少了cookie字段,After adding it, all the data can be returned correctly.
在这里插入图片描述
The request above is addedcookieAfter that, all the data is returned correctly,并且变成了code 200.

http client调用返回了code 302How to get the real request address

http code 302含义是什么

302状态码表示重定向,浏览器在拿到服务器返回的这个状态码后会自动跳转到一个新的URL地址,这个地址可以从响应的Location首部中获取(用户看到的效果就是他输入的地址A瞬间变成了另一个地址B.

http code 302与301区别

301是永久重定向,而302是临时重定向.301Suitable for permanent redirection; 302Suitable for temporary jumps.
301的定义:301 Moved Permanently 被请求的资源已永久移动到新位置,并且将来任何对此资源的引用都应该使用本响应返回的若干个URI之一.如果可能,拥有链接编辑功能的客户端应当自动把请求的地址修改为从服务器反馈回来的地址.除非额外指定,否则这个响应也是可缓存的.
302的定义:302 Found 请求的资源现在临时从不同的URI响应请求.由于这样的重定向是临时的,客户端应当继续向原有地址发送以后的请求.只有在Cache-Control或Expires中进行了指定的情况下,这个响应才是可缓存的.

The server-side jump means that the content of the address bar does not change(The address bar of the client browser will not display the destination addressURL),After the client request arrives,The server found that the current resource could not give a response,A jump that requests another resource inside the server.So the client does not know whether to jump or not,属于一次请求.
Client jump refers to the change of the content of the address bar,The client is then based on the response given by the serverURL再向服务器发送请求,So it's two requests.The client request arrives at the server,服务端返回一个 “to visit other links” 的回应.

如何获取http code 302The actual jump address

302重定向(redirect),So for this way,goThe language is automatically executed by defaultredirect的,所以没办法使用getRequest to get real address for the first description,在302重定向的时候,The real address is hereresponse的location中.
go语言中,默认是支持10层redirect,所以,unless jumped out,否则会redirect 到第10layer to exit,However, it can also be customized

方法一是通过GET来获取真正的Url

func ParseRealUrl(originUrl string) string {

client := &http.Client{

Timeout: 15 * time.Second,
CheckRedirect: checkRedirect,
}
req, err1 := http.NewRequest("GET", originUrl, nil)
req.Header.Set("user-agent", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.64 Safari/537.36")
if err1 != nil {

log.Sugar().Errorf("NewRequest err:%#v", err1.Error())
return ""
}
resp, err2 := client.Do(req)
//注意️这里不能err2 != nil而直接返回,这里就是checkRedirectInterception by returning an error
if err2 != nil {

log.Sugar().Errorf("Do err:%#v", err2.Error())
}
respUrl, err3 := resp.Location()
if err3 != nil {

log.Sugar().Errorf("Location err:%#v", err3.Error())
return ""
}
return respUrl.String()
}
func checkRedirect(req *http.Request, via []*http.Request) error {

if len(via) >= 1 {

return fmt.Errorf("checkRedirect len(via) >= 1")
}
return nil
}

创建client的时候,指定CheckRedirectrewritten for myselfmyCheckRedirect方法,指定“len(via) >= 1”,即第一次redirect就停止,就可以获取到真实的URL了.

方法二是通过HEAD来获取真正的Url


func ParseRealUrl(originUrl string) string {

client := &http.Client{

Timeout: 15 * time.Second,
CheckRedirect: checkRedirect,
}
req, err1 := http.NewRequest("HEAD", originUrl, nil)
req.Header.Set("user-agent", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.64 Safari/537.36")
if err1 != nil {

log.Sugar().Errorf("NewRequest err:%#v", err1.Error())
return ""
}
resp, err2 := client.Do(req)
if err2 != nil {

log.Sugar().Infof("Do err:%#v", err2.Error())
}
return resp.Header.Get("Location")
}
func checkRedirect(req *http.Request, via []*http.Request) error {

if len(via) >= 1 {

return fmt.Errorf("checkRedirect len(via) >= 1")
}
return nil
}

Romantic reminder:Over the mountain of technology,你可以与TAWave up~

参考

HTTP 206 Get file partial content and range request
如何处理http返回类型为206的数据
Breakpoint resume server processing(http206)
HTTP状态码206
面试连环炮系列(十四): HTTP状态码302的跳转逻辑
go自定义http请求,捕获302重定向

copyright:author[FeelTouch Labs],Please bring the original link to reprint, thank you. https://en.javamana.com/2022/218/202208061820407762.html