CS144 Lab 0

Lab Checkpoint 0: networking warmup

Checkpoint0 spec

Expected hours to complete: 2-6 hours

Over the course of this 8-part lab assignment, you’ll be building up your own implementation of a significant portion of the Internet—a router, a network interface, and the TCP protocol.

在Lab0中,我们需要实现一个使用TCP套接字获取网页的简易Web客户端和一个内存中的可靠字节流(ByteStream)。

FAQs

  1. What should the behavior of my program be if the caller tries to pop with a len greater than what is available?

We don't have any preference on the behavior if the caller tries to pop more than is buffered in the stream, as long as you behave reasonably and don't crash in that situation. If you want to pop the maximum available, that is fine with us. If you want to throw an exception, that is also fine with us.

  1. How fast should my code be?

Here's a reasonable rule: if any of your tests are taking longer than 2 seconds, it's probably good to keep exploring other ways of representing the ByteStream. And if your benchmark results at the end of the test are worse than 0.1 Gbit/s (100 million bits per second), that's another sign the design may need to be rethought. This lab doesn't require any sophisticated data structures, but it might still take some trial-and-error to find a reasonable approach. Sometimes "big-O" analysis can be misleading about the performance of real programs—the constant factors can matter a lot!

Set up GNU/Linux on your computer

Install the CS144 VirtualBox virtual-machine image (instructions at https://stanford.edu/class/cs144/vm_howto/vm-howto-image.html).

需要注意的是,VirtualBox需要直接在主机操作系统上运行,比如如果你是Windows主机,那么就需要在Windows系统中安装和运行VirtualBox(不能在WSL这样的虚拟环境中运行)

Networking by hand

Fetch a Web Page

  1. Fetch a web page http://cs144.keithw.org/hello
1
2
3
4
5
telnet cs144.keithw.org http 

GET /hello HTTP/1.1
Host: cs144.keithw.org
Connection: close
  1. Fetch a web page http://cs144.keithw.org/lab0/sunetid

Fetch a web page

Send yourself an email

这部分只有连接Stanford内部网络才能完成

Listening and connecting

In this part, we play around with server and client.

Writing a network program using an OS stream socket

You’ll write a program called “webget” that creates a TCP stream socket, connects to a Web server, and fetches a page.

Set up

注意在git clone之前可能需要先在虚拟机中配置Git代理(使用主机的VPN):

1
2
3
4
5
6
7
8
9
10
11
# 使用 10809 端口
git config --global http.proxy http://10.0.2.2:10809
git config --global https.proxy http://10.0.2.2:10809

# 查看当前代理设置
git config --global --get http.proxy
git config --global --get https.proxy

# 如果需要取消 Git 代理
git config --global --unset http.proxy
git config --global --unset https.proxy

Modern C++

  • Use the language documentation at https://en.cppreference.com as a resource. (We’d recommend you avoid cplusplus.com which is more likely to be out-of-date.)

  • Never use malloc() or free().

  • Never use new or delete.

  • Essentially never use raw pointers (*), and use “smart” pointers (unique ptr or shared ptr) only when necessary. (You will not need to use these in CS144.)

  • Avoid templates, threads, locks, and virtual functions. (You will not need to use these in CS144.)

  • Avoid C-style strings (char *str) or string functions (strlen(), strcpy()). These are pretty error-prone. Use a std::string instead.

  • Never use C-style casts (e.g., (FILE *)x). Use a C++ static_cast if you have to (you generally will not need this in CS144)

  • Prefer passing function arguments by const reference (e.g.: const Address & address)

  • Make every variable const unless it needs to be mutated.

  • Make every method const unless it needs to mutate the object.

  • Avoid global variables, and give every variable the smallest scope possible.

  • For suggestions on how to improve the code related to C++ programming practices:

    1
    cmake --build build --target tidy

    To format the code:

    1
    cmake --build build --target format

Writing webget

Before writing webget, read over the public interfaces (the part that comes after “public:” in the files util/socket.hh and util/file descriptor.hh).

In this part, we need to implement a simple Web client that fetches Web pages using a TCP socket provided by the operating system.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
void get_URL( const string& host, const string& path )
{
TCPSocket clientSocket;
Address serverAddress( host, "http" );

// Connect to the server.
clientSocket.connect( serverAddress );

// Send the HTTP request.
string request = "GET " + path + " HTTP/1.1\r\nHost: " + host + "\r\n" + "Connection: close\r\n\r\n";
clientSocket.write( (string_view)request );

// Read and print the response
while ( !clientSocket.eof() ) {
string response;
clientSocket.read( response );
cout << response;
}

clientSocket.close();
}

Test webget

An in-memory reliable byte stream

In this part, we need to implement a ByteStream object that provides reliable byte stream in memory. The Writer writes data into the stream and the Reader reads data from the stream.

What is the design of my ByteStream implementatino? What data structures and approach are taken?

设计这个ByteStream的关键在于选择用什么data structure来表示内存中存储数据的stream。这个data structure需要方便从一端写入,从另一端读出,以及peek其中存储的数据。

我第一个想到的是用queue,但是queue只能读取队列首尾的元素,不方便peek。

因此可以用vector<char>或者string,我这里选择用最直观的string。

1
std::string buffer_ {};

这样的话Writer的push很简单,直接用string的append函数:

1
2
3
4
5
6
7
void Writer::push( string data )
{
// Push data to the buffer, but only as much as available capacity allows.
uint64_t to_write = min( available_capacity(), data.size() );
buffer_.append( data, 0, to_write );
bytes_pushed_ += to_write;
}

那Reader的pop如何实现呢?

最容易想到的是每次pop len bytes的时候都将buffer_的前len个字符截掉:

1
buffer_ = buffer_.substr( len );

但是这样的话每次pop buffer_的内存都会重新分配,不符合pop 操作是 O(1) 的时间复杂度要求。

因此我想到用一个read_index_来追踪当前buffer_中读到的位置,这样每次pop只需要read_index_ += len,只在buffer_中已读过的bytes过多(超过buffer_ size的一半)时再重新整理:

1
2
3
4
5
6
7
8
9
10
11
12
13
void Reader::pop( uint64_t len )
{
// Pop as much data as possible from the buffer if len is greater than the amount of data available.
uint64_t to_read = min( bytes_buffered(), len );
read_index_ += to_read;
bytes_popped_ += to_read;

// When more than half of the buffer is read, shrink it to avoid unnecessary memory usage.
if ( read_index_ >= buffer_.size() / 2 ) {
buffer_ = buffer_.substr( read_index_ );
read_index_ = 0;
}
}

这样的话buffer_read_index之前的bytes是已经读过out of date的bytes,read_index之后的bytes才是被push进stream中等待被读取的bytes。

Finish!

All tests in check0 pass!

Test check0


CS144 Lab 0
http://oooscar8.github.io/2025/01/19/cs144-lab0/
作者
Alex Sun
发布于
2025年1月19日
许可协议