Sockets are a way to speak to other programs using standard Unix file descriptsors. When Unix programs do any sort of I/O, they do it by reading or writing to a file descriptor. A file descriptor is an integer associated with an open file. The file can be a network connection, a FIFO, a pipe, a terminal, a real on-the-disk file, or just anything else. To get a file descriptor, you make a call to the socket() system routine. It returns the the file descriptor and you communicate through it using the specialized send() and recv() calls. You can also use the normal read() and write() calls to communicate through socket, but send() and recv() offer much greater control over your data transmission.
Stream sockets are reliable two-way connected communications streams. If you output 2 items into the socket in the order "1,2", they will arrive in the same order "1,2" at the opposite end. They will also be error-free. It is referred to as "SOCK_DGRAM". Applications of stream sockets are Telnet and HTTP Protocol. Stream sockets use TCP protocol. Acknowlegement of sent packet is important when implementing stream sockets.
They are also known as connectionless sockets. With Datagram sockets you don't have to maintain an open connection as you do with stream sockets. The applications of datagram sockets are tftp and bootp. They use UDP protocol.
The socket descriptor is the file descriptor for the socket process. Socket Descriptor is of type int (integer)
There are two byte orderings: most significant byte (sometimes called an "octet") first, or least significant byte first. The former is called "Network Byte Order". Some machines store their numbers internally in Network Byte Order, some don't. Network Byte Order is also know as "Big-Endian Byte Order" The first StructTM--struct sockaddr. This structure holds socket address information for many types of sockets:
struct sockaddr {
unsigned short sa_family; // address family, AF_xxx
char sa_data[14]; // 14 bytes of protocol address
};
|
sa_family can be AF_UNIX (UNIX path names), AF_INET (DARPA Internet Addresses) and AF_OSI (as specified by International standards for OSI). The various address formats are defined in the file "sys/socket.h". For this tutorial the sa_family would be AF_INET
To deal with struct sockaddr, programmers created a parallel structure: struct sockaddr_in ("in" for "Internet".)
struct sockaddr_in {
short int sin_family; // Address family
unsigned short int sin_port; // Port number
struct in_addr sin_addr; // Internet address
unsigned char sin_zero[8]; // Same size as struct sockaddr
};
|
This structure makes it easy to reference elements of the socket address. Note that sin_zero (which is included to pad the structure to the length of a struct sockaddr) should be set to all zeros with the function memset(). Also, and this is the important bit, a pointer to a struct sockaddr_in can be cast to a pointer to a struct sockaddr and vice-versa. So even though socket() wants a struct sockaddr*, you can still use a struct sockaddr_in and cast it at the last minute! Also, notice that sin_family corresponds to sa_family in a struct sockaddr and should be set to "AF_INET". Finally, the sin_port and sin_addr must be in Network Byte Order!
"But, how can the entire structure, struct in_addr sin_addr, be in Network Byte Order?" This question requires careful examination of the structure struct in_addr, one of the unions alive:
// Internet address (a structure for historical reasons)
struct in_addr {
unsigned long s_addr; // that's a 32-bit long, or 4 bytes
};
|
So if you have declared ina to be of type struct sockaddr_in, then ina.sin_addr.s_addr references the 4-byte IP address (in Network Byte Order). Note that even if your system still uses the union for struct in_addr, you can still reference the 4-byte IP address in exactly the same way as we did above (this due to #defines.)
There are two types of Bytes Order that you can convert: short (two bytes) and long (four bytes). These functions work for the unsigned variations as well. Say you want to convert a short from Host Byte Order to Network Byte Order. Start with "h" for "host", follow it with "to", then "n" for "network", and "s" for "short": h-to-n-s, or htons() (read: "Host to Network Short").
You can use every combination if "n", "h", "s", and "l" you want, not counting the really non-existing ones. For example, there is NOT a stolh() ("Short to Long Host") function--not at this party, anyway. But there are:
htons() -- "Host to Network Short"
htonl() -- "Host to Network Long"
ntohs() -- "Network to Host Short"
ntohl() -- "Network to Host Long"
A final point: why do sin_addr and sin_port need to be in Network Byte Order in a struct sockaddr_in, but sin_family does not? The answer: sin_addr and sin_port get encapsulated in the packet at the IP and UDP layers, respectively. Thus, they must be in Network Byte Order. However, the sin_family field is only used by the kernel to determine what type of address the structure contains, so it must be in Host Byte Order. Also, since sin_family does not get sent out on the network, it can be in Host Byte Order.
There are a bunch of functions that allows you to manipulate IP addresses.
First, let's say you have a struct sockaddr_in ina, and you have an IP address "10.12.110.57" that you want to store into it. The function you want to use, inet_addr(), converts an IP address in numbers-and-dots notation into an unsigned long. The assignment can be made as follows:
ina.sin_addr.s_addr = inet_addr("10.12.110.57");
|
Notice that inet_addr() returns the address in Network Byte Order already--you don't have to call htonl()
Now, the above code snippet isn't very robust because there is no error checking. See, inet_addr() returns -1 on error. Remember binary numbers? (unsigned)-1 just happens to correspond to the IP address 255.255.255.255! That's the broadcast address! Remember to do your error checking properly.
Actually, there's a cleaner interface you can use instead of inet_addr(): it's called inet_aton() ("aton" means "ascii to network"):
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
int inet_aton(const char *cp, struct in_addr *inp);
|
And here's a sample usage, while packing a struct sockaddr_in (this example will make more sense to you when you get to the sections on bind() and connect().)
struct sockaddr_in my_addr;
my_addr.sin_family = AF_INET; // host byte order
my_addr.sin_port = htons(MYPORT); // short, network byte order
inet_aton("10.12.110.57", &(my_addr.sin_addr));
memset(&(my_addr.sin_zero), '\0', 8); // zero the rest of the struct
|
inet_aton(), unlike practically every other socket-related function, returns non-zero on success, and zero on failure. And the address is passed back in inp.
Unfortunately, not all platforms implement inet_aton() so, although its use is preferred, the older more common inet_addr() is used in this guide.
To convert string IP binary represenations to string IP addresses you will have to use the function inet_ntoa() ("ntoa" means "network to ascii") like this:
printf("%s", inet_ntoa(ina.sin_addr));
|
That will print the IP address. Note that inet_ntoa() takes a struct in_addr as an argument, not a long. Also notice that it returns a pointer to a char. This points to a statically stored char array within inet_ntoa() so that each time you call inet_ntoa() it will overwrite the last IP address you asked for. For example:
char *a1, *a2;
.
.
a1 = inet_ntoa(ina1.sin_addr); // this is 192.168.4.14
a2 = inet_ntoa(ina2.sin_addr); // this is 10.12.110.57
printf("address 1: %s\n",a1);
printf("address 2: %s\n",a2);
|
will print:
address 1: 10.12.110.57
address 2: 10.12.110.57
|
If you need to save the address, strcpy() it to your own character array.