TCP拥塞控制【优质内容】-

1第二章TCP拥塞控制江勇jiangysz.tsinghua.edu.cn2OutlineWindow Flow ControlAnalysis of TCP Congestion ControlTahoe, Reno, NewReno,Sack, VegasBeyond TCP Congestion Control3OutlineWindow Flow ControlAnalysis of TCP Congestion ControlTahoe, Reno, NewReno, Sack, VegasBeyond TCP Congestion Control4ProblemFlow control-make sure that the receiver can receive as fast as you send Congestion control-make sure that the network delivers the packets to the receiver5Window flow controlLost packet detected by ACKApproximately W packets transmitted per RTTHigher window higher throughputSelf-clocking112W221RTTWWtimetimeSourceDestination6Early TCPPre-1988Go-back-N ARQ-Detects loss from timeout-Retransmits from lost packet onwardFlow control: self-clocking7Source rateLimit the number of packets in the network to window WSource rate = bpsIf W too small then rate capacityIf W too big then rate capacity= congestion8Why do You Care About Congestion Control?Otherwise you get to congestion collapseHow might this happen?-Assume network is congested (a router drops packets) -You learn the receiver didnt get the packeteither by ACK, NACK, or Timeout-What do you do? retransmit packet-Still receiver didnt get the packet-Retransmit again-. and so on -And now assume that everyone is doing the same!9Why do You Care About Congestion Control?Network will become more and more congested-And this with duplicate packets rather than new packets!October 1986, Internet had its first congestion collapse-Link LBL to UC Berkeley 400 yards, 3 hops, 32 Kbpsthroughput dropped to 40 bpsfactor of 1000 drop!1988, Van Jacobson proposed TCP flow control10Solutions?Increase buffer size. Why not?Slow down-If you know that your packets are not delivered because network congestion, slow downQuestions:-How do you detect network congestion?-By how much do you slow down?11OutlineWindow Flow ControlAnalysis of TCP Congestion ControlTahoe, Reno, NewReno, Sack, VegasBeyond TCP Congestion Control12Whats Really Happening? Knee point after which -Throughput increases very slow-Delay increases fastCliff point after which-Throughput starts to decrease very fast to zero (congestion collapse)-Delay approaches infinityNote (in an M/M/1 queue)-Delay = 1/(1 utilization)LoadLoadThroughputDelaykneecliffcongestioncollapsepacketloss13Congestion Control vs. Congestion AvoidanceCongestion control goal-Stay left of cliff Congestion avoidance goal-Stay left of kneeLoadThroughputkneecliffcongestioncollapse14GoalsOperate near the knee pointRemain in equilibriumHow to maintain equilibrium?-Dont put a packet into network until another packet leaves. How do you do it?-Use ACK: send a new packet only after you receive an ACK. Why?-Maintain number of packets in network “constant”15How Do You Do It?Detect when network approaches/reaches knee pointStay thereQuestions-How do you get there?-What if you overshoot (i.e., go over knee point) ?Possible solution:-Increase window size until you notice congestion -Decrease window size if network congested16Detecting CongestionExplicit network signal-Send packet back to source (e.g. ICMP Source Quench)Control traffic congestion collapse-Set bit in header (e.g. DEC DNA/OSI Layer 4CJ89, ECN)Can be subverted by selfish receiver SEW01-Unless on every router, still need end-to-end signal-Could be robust, if deployed 17Detecting CongestionImplicit network signal-Loss (e.g. TCP Tahoe, Reno, New Reno, SACK)+relatively robust, -no avoidance-Delay (e.g. TCP Vegas)+avoidance, -difficult to make robust-Easily deployable-Robust enough? Wireless?18Efficient AllocationToo slow-Fail to take advantage of available bandwidth underloadToo fast-Overshoot knee overload, high delay, lossEveryones doing it-May all under/over shoot large oscillationsOptimal:-xi=XgoalEfficiency = 1 - distance from efficiency lineUser 1: x1User 2: x2Efficiencyline2 user exampleoverloadunderload19Fair AllocationMaxmin fairness-Flows which share the same bottleneck get the same amount of bandwidthAssumes no knowledge of prioritiesFairness = 1 - distance from fairness lineUser 1: x1User 2: x22 user example2 gettingtoo much1 getting too muchfairnessline20Control System Model CJ89Simple, yet powerful modelUser 1User 2User nx1x2xnxiXgoaly21Possible ChoicesMultiplicative increase, additive decrease-aI=0, bI1, aD0, bI=1, aD1, aD=0, 0bD0, bI=1, aD=0, 0bD= ssthresh ACK 1segment 1cwnd = 1cwnd = 2segment 2segment 3ACK 2cwnd = 4segment 4segment 5segment 6segment 7ACK3-4cwnd = 834Congestion AvoidanceSlow down “Slow Start”If cwnd ssthresh then each time a segment is acknowledged increment cwnd by 1/cwnd (cwnd += 1/cwnd).So cwnd is increased by one only if all segments have been acknowlegded.Congestion avoidance and slow start are independent algorithms with different objectives. In practice they are implemented together. 35Slow Start/Congestion Avoidance ExampleAssume that ssthresh = 8Roundtrip timesCwnd (in segments)ssthresh36Putting Everything Together:TCP PseudocodeInitially:cwnd = 1;ssthresh = infinite;New ack received:if (cwnd ssthresh) /* Slow Start*/ cwnd = cwnd + 1;else /* Congestion Avoidance */ cwnd = cwnd + 1/cwnd;Timeout:/* Multiplicative decrease */ssthresh = cwnd/2;cwnd = 1;while (next unack + win)transmit next packet;where win = min(cwnd, flow_win);unacknextwinseq #37Packet LossPacket loss detected by-Retransmission timeouts (RTO timer)-Duplicate ACKs (at least 3)123456123PacketsAcknowledgements337338Fast RetransmitImmediately retransmits after 3 dupACKs without waiting for timeout (fast retransmit)Adjusts ssthresh-ssthresh max(flightsize/2, 2)-flightsize: The amount of data that has been sent but not yet acknowledged.Enter Slow Start (cwnd = 1)Assumption: loss indicates congestion (buffer overflow)The fast retransmit algorithm first appeared in the 4.3BSD Tahoe release, and it was followed by slow start. 39Summary: TahoeBasic ideas-Gently probe network for spare capacity-Drastically reduce rate on congestion-Windowing: self-clocking-Other functions: round trip time estimation, error recoveryfor every ack if W ssthresh then W + (ss) else W += 1/W (ca)for every lossssthresh := W/2 W := 1 40TCP Reno (Jacobson 1990)Slow start, congestion avoidance, fast retransmitAddition: fast recoveryMost widely used version of TCP today sstimewindowcass: slow startca: congestion avoidance41Fast recoveryMotivation: prevent pipe from emptying after fast retransmitIdea: each dupACK represents a packet having left the pipe, that is, there is still data flowing between the two ends.Enter FR/FR after 3 dupACKs-Set ssthresh max(flightsize/2, 2)-Retransmit lost packet-Set cwnd ssthresh + 3*MSS (window inflation)-Each time another duplicate ACK arrives, cwnd += MSS. Transmit a packet, if allowed by the new value of cwnd. -On non-dup ACK (1 RTT later), set cwnd ssthresh (window deflation)Enter CA42Example: FR/FRFast retransmit-Retransmit on 3 dupACKsFast recovery-Inflate window while repairing loss to fill pipeThe fast recovery algorithm appeared in the 4.3BSD Reno release. 12timeStimeD345687100070099000111Exit FR/FRRTT843Summary: RenoBasic ideas-Fast recovery avoids slow start-dupACKs: fast retransmit + fast recovery-Timeout: fast retransmit + slow startAt steady state, cwnd oscillates around the optimal window size.slow startretransmitcongestion avoidance FR/FR dupACKstimeout44Variant : NewReno (Fall & Floyd1996)Motivation: multiple losses within a window-Partial ACK acknowledges some but not all packets outstanding at start of FR-Partial ACK takes Reno out of FR, deflates window-Sender may have to wait for timeout before proceedingIdea: partial ACK indicates lost packets-Stays in FR/FR and retransmits immediately-Retransmits 1 lost packet per RTT until all lost packets from that window are retransmitted-Eliminates timeout45Example: NewRenoOn 3 dupACKs, receiver has packets 2, 4, 6, 8, cwnd=8, retransmits pkt 1, enter FR/FRNext dupACK increment cwnd to 9After a RTT, ACK arrives for pkts 1 & 2, exit FR/FR, cwnd=5, 8 unacked pktsNo more ACK, sender must wait for timeout12timeStimeD34568718FR/FR090000098 unackd pkts253timeout46Variant: SACK (Mathis, Mahdavi, Floyd, Romanow 96)Motivation: Reno & NewReno retransmit at most 1 lost packet per RTT-Pipe can be emptied during FR/FR with multiple lossesIdea: SACK provides better estimate of packets in pipe-SACK TCP option describes received packets-On 3 dupACKs: retransmits, halves window, enters FR-Updates pipe = packets in pipeIncrement when lost or new packets sentDecrement when dupACK received-Transmits a (lost or new) packet when pipe cwnd-Exit FR when all packets outstanding when FR was entered are acknowledged47TCP Vegas (Brakmo & Peterson 1994)Reno with a new congestion avoidance algorithmConverges (provided buffer is large) !sstimewindowca48Congestion avoidanceEach source estimates number of its own packets in pipe from RTTAdjusts window to maintain estimate between ad and bdfor every RTT if W/RTTmin W/RTT b RTTmin then W - for every lossW := W/249ImplicationsCongestion measure = end-to-end queueing delayAt equilibrium-Zero loss-Stable window at full utilization-Approximately weighted proportional fairness-Nonzero queue, larger for more sourcesConvergence to equilibrium-Converges if sufficient network buffer-Oscillates like Reno otherwise50Reflections on TCPAssumes that all sources cooperateAssumes that congestion occurs on time scales greater than 1 RTTOnly useful for reliable, in order delivery, non-real time applicationsVulnerable to non-congestion related loss (e.g. wireless)Can be unfair to long RTT flows51OutlineWindow Flow ControlAnalysis of TCP Congestion ControlTahoe, Reno, VegasBeyond TCP Congestion Control52TCP ProblemsWhen TCP congestion control was originally designed in 1988:-Maximum link bandwidth: 10Mb/s-Users were mostly from academic and government organizations (i.e., well-behaved)-Almost all links were wired (i.e., negligible error rate)Thus, current problems with TCP:-High bandwidth-delay product paths-Selfish users-Wireless (or any high error links)53High Bandwidth-Delay Product PathsMotivation-10Gb/s links now common in Internet coreas a result of Wave Division Multiplexing (WDM), link bandwidth 2x/9 months-Some users have access to and need for 10Gb/s end-to-ende.g., very large scientific, financial databases-Satellite/Interplanetary links have a high delayProblems-slow start-Additive increase, multiplicative decrease (AIMD)54Slow StartTCP throughput controlled by congestion window (cwnd) sizeIn slow start, window increases exponentially, but may not be enoughexample: 10Gb/s, 200ms RTT, 1460B payload, assume no loss-Time to fill pipe: 18 round trips = 3.6 seconds-Data transferred until then: 382MB-Throughput at that time: 382MB / 3.6s = 850Mb/s-8.5% utilization not very goodLose only one packet drop out of slow start into AIMD (even worse)55AIMDIn AIMD, cwnd increases by 1 packet/ RTTAvailable bandwidth could be large-e.g., 2 flows share a 10Gb/s link, one flow finishes available bandwidth is 5Gb/s-e.g., suffer loss during slow start drop into AIMD at probably much less than 10Gb/stime to reach 100% utilization is proportional to available bandwidth-e.g., 5Gb/s available, 200ms RTT, 1460B payload 17,000s56Simulation ResultsRound Trip Delay (sec)Avg. TCP UtilizationBottleneck Bandwidth (Mb/s)Avg. TCP UtilizationShown analytically in Low01 and via simulations50 flows in both directionsBuffer = BW x DelayRTT = 80 ms50 flows in both directionsBuffer = BW x DelayBW = 155 Mb/s57Proposed Solution: Decouple Congestion Control from FairnessExample: In TCP, Additive-Increase Multiplicative-Decrease (AIMD) controls bothCoupled because a single mechanism controls bothHow does decoupling solve the problem? 1.To control congestion: use MIMD which shows fast response2.To control fairness: use AIMD which converges to fairness58Characteristics of Solution1.Improved Congestion Control (in high bandwidth-delay & conventional environments):Small queuesAlmost no drops2.Improved Fairness3.Scalable (no per-flow state)4.Flexible bandwidth allocation: min-max fairness, proportional fairness, differential bandwidth allocation, 59XCP: An eXplicit Control Protocol1. Congestion Controller2. Fairness Controller60Feedback Round Trip TimeCongestion WindowCongestion HeaderFeedback Round Trip TimeCongestion Window How does XCP Work?Feedback = + 0.1 packet61Feedback = + 0.1 packet Round Trip TimeCongestion WindowFeedback = - 0.3 packet How does XCP Work?62 Congestion Window = Congestion Window + FeedbackRouters compute feedback without any per-flow state How does XCP Work?XCP extends ECN and CSFQ63How Does an XCP Router Compute the Feedback?Congestion ControllerFairness ControllerGoal: Divides between flows to converge to fairnessLooks at a flows state in Congestion Header Algorithm:If 0 Divide equally between flowsIf 0 Divide equally between flowsIf k coded blocks-can recover original k blocks from any k of the n blocks-n-k blocks of overhead-trade bandwidth for loss-can recover from loss in time independent of link RTTuseful for links that have long RTT (e.g. satellite)-pay n-k overhead whether loss or notneed to adapt n, k depending on current channel conditions83HybridIndirect TCP BB95-Split TCP connection into two parts-regular TCP from fixed host (FH) to base station-modified TCP from base station to mobile host (MH)-base station fails?-wired path faster than wireless path?84HybridTCP Snoop BSK95-Base station snoops TCP packets, infers flow-cache data packets going to wireless side-If dup acks from wireless side, suppress ack and retransmit from cache-soft state-what about non-TCP protocols?-what if wireless not last hop?85ConclusionTransport protocol modifications not deployed-SACK was deployed because of general utilityCellular, 802.11b-link level retransmissions-802.11b: acks necessary anyway in MAC for collision avoidance-additional delay is only a few link RTTs (5ms)Satellite-FEC because of long RTT issuesLink layer solutions give adequate, predictable performance, easily deployable86ReferencesW. Stevens. TCP Slow Start, Congestion Avoidance, Fast Retransmit, and Fast Recovery Algorithms. RFC 2001, Jan. 1997. S. Floyd and T. Henderson. The NewReno Modification to TCPs Fast Recovery Algorithm. RFC2582, April 1999. M. Mathis, J. Mahdavi, S. Floyd, and A. Romanow. TCP Selective Acknowledgment Options. RFC 2018, October 1996. 87Reading PapersDina Katabi, Mark Handley, and Charlie Rohrs. Congestion Control for High Bandwidth-Delay Product Networks, In: Proceedings on ACM Sigcomm 2002.-http:/ana.lcs.mit.edu/dina/XCP/L. Brakmo, S. OMalley, and L. Peterson. TCP Vegas: New techniques for congestion detection and avoidance. In Proceedings of ACM SIGCOMM 1994.