hckrnws
The surprising struggle to get a Unix Epoch time from a UTC string in C or C++
by PascalW
Is it a struggle though?
They needed to have a locale matching the language of the localised time string they wanted to parse, they needed to use strptime to parse the string, they needed to use timegm() to convert the result to seconds when seen as UTC. The man pages pretty much describe these things.
The interface or these things could certainly be nicer, but most of the things they bring up as issues aren't even relevant for the task they're trying to do. Why do they talk about daylight savings time being confusing when they're only trying to deal with UTC which doesn't have it?
It is not.
int main(void) {
struct tm tm = {0};
const char *time_str = "Mon, 20 Jan 2025 06:07:07 GMT";
const char *fmt = "%a, %d %b %Y %H:%M:%S GMT";
// Parse the time string
if (strptime(time_str, fmt, &tm) == NULL) {
fprintf(stderr, "Error parsing time\n");
return 1;
}
// Convert to Unix timestamp (UTC)
time_t timestamp = timegm(&tm);
if (timestamp == -1) {
fprintf(stderr, "Error converting to timestamp\n");
return 1;
}
printf("Unix timestamp: %ld\n", timestamp);
return 0;
}
It is a C99 code snippet that parses the UTC time string and safely converts it to a Unix timestamp and it follows best practices from the SEI CERT C standard, avoiding locale and timezone issues by using UTC and timegm().You can avoids pitfalls of mktime() by using timegm() which directly works with UTC time.
Where is the struggle? Am I misunderstanding it?
Oh by the way, must read: https://www.catb.org/esr/time-programming/ (Time, Clock, and Calendar Programming In C by Eric S. Raymond)
"Mon, 20 Jan 2025 06:07:07 GMT"
I thought the default output of date(1), with TZ unset, is something like
Mon Jan 20 06:07:07 UTC 2025
That's the busybox default anywayWell, `Mon Jan 20 06:07:07 UTC 2025` does not match `fmt` in the code. My input matches the format string exactly, which is why it works.
You could use `"%a %b %d %H:%M:%S %Z %Y"` for `fmt` (which is indeed the default for `date`) and it would work with yours.
Both results in the same timestamp.
I can't find `timegm` neither in the C99 standard draft nor in POSIX.1-2024.
The first sentence of your link reads:
>The C/Unix time- and date-handling API is a confusing jungle full of the corpses of failed experiments and various other traps for the unwary, many of them resulting from design decisions that may have been defensible when the originals were written but appear at best puzzling today.
Here is some of my code that works around not having timegm. It is detected in a configure script, so there's a #define symbol indicating whether it's available.
timegm was finally standardized by C23, and POSIX-2024 mentions it in the FUTURE DIRECTIONS section of mktime. I don't know precisely what happened with POSIX. I think timegm got lost in the shuffle and by the time Austin Group attention turned back to it, it made more sense to let C23 pick it up first so there were no accidental conflicts in specification.[1]
[1] POSIX-2024 incorporates C17, not C23, but in practice the typical POSIX environment going forward will likely be targeting POSIX-2024 + C23, or just POSIX-2024 + extensions; and hopefully neither POSIX nor C will wait as long between standard updates as previously.
https://man7.org/linux/man-pages/man3/timegm.3.html
It's not posix, but it's pretty available
Yeah, you're correct that `timegm` is neither part of the C99 standard nor officially specified in POSIX.1-2024 but it is widely supported in practice on many platforms, including glibc, musl, and BSD systems which makes it a pragmatic choice in environments where it is available. Additionally, it is easy to implement it in a portable way when unavailable.
So, while `timegm` is not standardized in C99 or POSIX, it is a practical solution in most real-world environments, and alternatives exist for portability, and thus: handling time in C is not inherently a struggle.
As for the link, it says "You may want to bite the bullet and use timegm(3), even though it’s nominally not portable.", but see what I wrote above.
timegm() is even available on Haiku
> Is it a struggle though?
It’s twelve lines or more, if you include the imports and error handling.
Spreadsheets and SQL will coerce a string to a date without even being asked to. You might want something more structured than that, but you should be able to do it in far less than 12 lines.
C has many clunky elements like this, which makes working with it like pulling teeth.
>Spreadsheets and SQL will coerce a string to a date without even being asked to.
But only when you don't want them to, when you do want them to do it it's still a pain.
almost like C is logically operating at a lower level than spreadsheets or SQL or something
Spreadsheets and SQL will coerce a string to a date because someone programmed them to in C or C++.
> you should be able to do it in far less than 12 lines
In C++, maybe. In C, not necessarily. If you're not willing to reinvent the wheel why'd you choose C anyway?
Comment was deleted :(
What's a man page? [cit]
"manual pages", type "man man" in your terminal.
Never type up man man, it might make the internet implpode.
It's where people went for programming information before ChatGPT and even before StackOverflow.
It's where people went for information "even before" the internet.
I'm sorry the sarcasm was not evident. I learnt to program when men were men, and man was man.
[dead]
My personal rule for time processing: use the language-provided libraries for ONLY 2 operations: converting back and forth between a formatted time string with a time zone, and a Unix epoch timestamp. Perform all other time processing in your own code based on those 2 operations, and whenever you start with a new language or framework, just learn those 2.
I've wasted so many dreary hours trying to figure out crappy time processing APIs and libraries. Never again!
Starting from timestamp A, how do I find the Unix timestamp B corresponding to exactly 6 months in the future from timestamp B?
Adding or subtracting "months" is inherently difficult because months don't have set lengths, varying from 28 through 31 days. Thus adding one month to May 31 is weird: should that be June 30 or July 1 or some other date?
Try not to have to do this sort of thing. You might have to though, and then you'll have to figure out what adding months means for your app.
Welcome to Business Logic. This is where I'd really like pushback to result in things that aren't edgecases.
However you also run into day to day business issues like:
* What if it's now a Holiday and things are closed?
* What if it's some commonly busy time like winter break? (Not quite a single holiday)
* What if a disaster of somekind (even just a burst waterpipe) halts operations in an unplanned way?
Usually flexability needs to be built in. It can be fine to 'target' +3 months, but specify it as something like +3m(-0d:+2w) (so, add '3 months' ignoring the day of month, clamp dom to a valid value, allow 0 days before or 14 days after),
Do all edge cases need to be handled? Just be late when there's a holiday.
72 business hours sounds more like human time than computer time anyways.
Yes, basically, they do need to be handled, but you have to define that for your own case. It's a real pain, if you have to do month math.
I think the parent is describing a "bring your own library" approach where a set of known to the author algorithms will be used for those calculations and the only thing the host language will be used for is the parse/convert.
It does remove a lot of the ambiguity of "I wonder what this stdlib's quirks are in their date calculations" but it also seems like a non-trivial amount of effort to port every time.
The difficulty of this problem rests on the ambiguity of the phrase "exactly 6 months", which is going to depend totally on the precise business logic. But there's no reason to suppose that the requirements of the business logic will agree with the concepts implemented by the datetime library.
"Exactly 6 months in the future" from an arbitrary timestamp is not well-defined, even when assuming a fixed time zone. What is it supposed to mean?
13 more years to go until the 2038 problem.
Surely we'll have everything patched up by then..
wow that is dedication 1970-01-01! :-)
It worries me how blasé we seem to be to the 2038 problem.
I wonder if people will still be repeating the "Y2k myth" myth as things start to fail.
People are doing things[0]. We'll see closer to the date what's left, I suppose.
[0] https://en.wikipedia.org/wiki/Year_2038_problem#Implemented_...
Almost exactly 13 years, in fact!
The overflow happens at 2038-01-19T03:14:08Z.
The concept of a process-wide locale was a mistake. All locale-dependent functons should be explicit. Yes that means some programs won't respect your locale because the author didn't care to add support but at least they won't break in unexpected ways because some functions magically work differently between the user's and developers system.
Totally agree. Python's gettext() API feels so ancient because it can only cope with one locale at a time, and it would love to get that locale from an environment variable. Not ideal for writing an HTTP service that sends text based on the Accept-Language header.
It was a very reasonable design when most programs were local-only.
It really wasn't. Even local-only programs need to process data that isn't formatted in the user's locale.
thread-local locale you can easily save and restore would work. In other words, dynamically scoped.
But you don't want to be processing data in locale dependent-ways using the crap available in ISO C.
Comment was deleted :(
The headline doesn’t match the article. As it points out, C++20 has a very nice, and portable, time library. I quibble with the article here, though: in 2025, C++20 is widely available.
Indeed. The article should be retitled "C still useless in 2025, including time handling".
It would be incorrect, but it's already incorrect as what they're doing isn't really a struggle, so I guess the net result is neutral?
Damn, I didn't notice that C++20 added a whole bunch of new features to the std::chrono library! Nice!
I think that time handling is the most hard thing in the world of programming.
Explanation: you can learn heap sort or FFT or whatever algorithm there is and implement it. But writing your own calendar from scratch, that will do for example chron job on 3 am in the day of DST transition, that works in every TZ, is a work for many people and many months if not years...
Time handling is exceptionally easy. Time zone handling is hard. It doesn't help that the timezone database isn't actually designed to make this any easier.
Meanwhile I edited my comment but we're still agreeing. And adding them for example to embedded systems is additional pain. Example: tram or train electronic boards / screens
I don’t know. I’ve written that seemed like obvious simple code that got tripped up with the 25 hour day on DST transition. That’s when I learned to stick to UTC.
Debian’s vixie-cron had a bug [0] where if the system TZ was changed without restarting crond, it would continue to run jobs based on the old TZ. It checked for DST transitions, but not TZ.
In fairness, it’s not something that should happen much at all, if ever.
[0]: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1019716
Assuming the unstated requirement that you want your cron job to only run once per day, scheduling for 3 am is not a software problem. It's a lack of understanding by the person problem. By definition times around the time change can occur twice or not at all. Also, in the US 3am would never be a problem as the time changes at 2 am.
Also, naming things, cache coherency, and off by one errors are the two hardest problems in computer science.
I used the ICU packages when I needed to do something like this but it's been a decade since I coded in C++.
The Abseil time library makes time and date parsing and manipulation a lot nicer in C++: https://abseil.io/docs/cpp/guides/time
Fun fact, http 1 used to pass expirations and dates in string format.
[Missing scene]
" We are releasing Http1.1 specifications whereby expirations are passed as seconds to expire instead of dates as strings."
For those skimmimg the problem is mktime() returns local time, and they want it in UTC. So you need to subtract the timezone used, but the timezone varies by date you feed mktime() and there is no easy way to determime it.
If you are happy for the time to perhaps be wrong around the hours timezone changes, this is an easy hack:
import time
def time_mktime_utc(_tuple):
result = time.mktime(_tuple[:-1] + (0,))
return result * 2 - time.mktime(time.gmtime(result))
If you are just using it for display this is usually fine as time zone changes are usually timed to happen when nobody is looking.It is easier in Python:
>>> from email.utils import parsedate_tz, mktime_tz
>>> mktime_tz(parsedate_tz("Fri, 17 Jan 2025 06:07:07"))
1737094027
It converts rfc 2822 time into POSIX timestamp ([mean solar] seconds since epoch--elapsed SI seconds not counting leap seconds).And the answer is to use `gmtime()`, which AIX doesn't have and which Windows calls something else, but, whatever, if you need to support AIX you can use an open source library.
AIX has gmtime [0], too. Since at least 7.1.
[0] https://www.ibm.com/docs/en/aix/7.1?topic=c-ctime-localtime-...
That is not really the problem.
mktime() parses the time string which lacks any information on time zones
then the article uses timegm() to convert it to unixtime on the assumption that it was in UTC
also it's about C
> mktime() parses the time string
No, mktime() doesn't parse a string. Parsing the string is done by strptime(). mktime() takes the output of strptime(), which is a C structure or the equivalent in Python - a named tuple with the same fields.
> the time string lacks any information on time zones
Not necessarily. Time strings often contain a time zone. The string you happen to be parsing doesn't contain a time zone you could always append one. If it did have a time zone you could always change it to UTC. So this isn't the problem either.
The root cause of the issue is the "struct tm" that strptime() outputs didn't have field for the time zone so if the string has one, it is lost. mktime() needs that missing piece of information. It solves that problem by assuming the missing time zone is local time.
> then the article uses timegm() to convert it to unixtime on the assumption that it was in UTC
It does, but timegm() is not a POSIX function so isn't available on most platforms. gmtime() is a POSIX function and is available everywhere. It doesn't convert a "struct tm", but it does allow you to solve the core problem the article labours over, which is finding out what time zone offset mktime() used. With that piece of information it's trivial to convert to UTC, as the above code demonstrates in 2 lines.
> also it's about C
The python "time" module is a very thin wrapper around the POSIX libc functions and structures. There is a one to one correspondence, mostly with the same names. Consequently any experienced C programmer will be able translate the above python to C. I chose Python because it expresses the same algorithm much more concisely.
Comment was deleted :(
The first rule of thumb is to never use functions from glibc (gmtime, localtime, mktime, etc) because half of them are non-thread-safe, and another half use a global mutex, and they are unreasonably slow. The second rule of thumb is to never use functions from C++, because iostreams are slow, and a stringstream can lead to a silent data loss if an exception is thrown during memory allocation.
ClickHouse has the "parseDateTimeBestEffort" function: https://clickhouse.com/docs/en/sql-reference/functions/type-... and here is its source code: https://github.com/ClickHouse/ClickHouse/blob/74d8551dadf735...
I came to make the thread safe comment. Got bit by that myself formatting is8601, would get wrong output... Sometimes.
I won't believe anyone who tells me that handling time in c/c++ isn't perilous.
> give us some truly excellent code that we really don’t deserve
Why such self flagellation?
Comment was deleted :(
This comment section is so nerdy I love it.
[dead]
[flagged]
[flagged]
Jeez, read the article. C++20 has such an elegant solution that it has him swooning.
Being truly luxurious, the tz library supports using not just your operating system’s time zone databases, which might lack crucial leap second detail, but can also source the IANA tzdb directly. This allows you to faithfully calculate the actual duration of a plane flight in 1978 that not only passed through a DST change, but also an actual leap second. I don’t swoon easily, but I’m swooning.
Was waiting for rustaceans to come and base c/cpp.. didn’t take long
They have five comments ever. What makes you think they use rust?
emcell is the name of a rust crate
And the fox said: "These grapes are sour".
That makes no sense in this context. What's the situation you're imagining where they wanted to use C but something else prevented them so they made up an excuse to call C bad?
You know sometimes people just dislike things, right?
Comment was deleted :(
You cannot since it's missing time zone
UTC is a timezone, though. Or am I misunderstanding what you're saying?
That is fine as long as the input / output is always in UTC... but at the end of the day you often want to communicate that timepoint to a human user (e.g. an appointment time, the time at which some event happened, etc.), which is when our stupid monkey brains expect the ascii string you are showing us to actually make sense in our specific locale (including all of the warts each of those particular timezones have, including leap second, DST, etc.)
> That is fine as long as the input / output is always in UTC
But the title specifically say "from a UTC string", so it _is_ a UTC string, always.
> ascii string you are showing us to actually make sense in our specific locale
Locale and TZ are two completely separate things. You can use any locale in any TZ. You can use any locale in any location, too.
That's a localization task, not timekeeping task.
It is not, if what the user expects to stay constant is their local calendar/wall clock time rather than the UTC instant. Which is usually the case. This is a transformation that needs late binding as DST and timezone rules can change, so it can't just be handled as a localisation transformation on input/output
UTC would be marked as +Z
Without any marking, it could be anything
The article explicitly mentions UTC all over the place. It's UTC.
No '+'.
Noon UTC is "12:00Z".
One factor complicates things a bit: this is one way to encode that the timezone is UTC, but +00:00 is also commonly used for example.
RFC 3339 is nice for this reason. Always UTC and terminated with a Z.
RFC 3339 isn't always UTC and doesn't mandate Z, in only removes some extra flexibility of ISO 8601, like comma separator or short syntax.
Until you understand that the core of unix time is the "day", in the end, you only need to know the first leap year (If I recall properly it is 1972), then you have to handle the "rules" of leap years, and you will be ok (wikipedia I think, don't use google anymore since they now force javascript upon new web engines).
I did write such code in RISC-V assembly (for a custom command line on linux to output the statx syscall output). Then, don't be scared, with a bit of motivation, you'll figure it out.
The core of the UNIX time is seconds since epoch, nothing else. 'Day' has no special place at all. There are calendars for converting to and from dates, including Western-style, but the days in those calendars vary in length because of daylight saving switches and leap seconds for example.
UNIX time ignores leap seconds, so every day is exactly 86400 seconds, and every year is either 365*86400 or 366*86400 seconds. This makes converting from yyyy-mm-dd to UNIX time quite easy, as you can just do `365*86400*(yyyy-1970) + leap_years*86400` to get to yyyy-01-01.
Well yes, in the sense that not all Unix epoch seconds are equally long...
Yeap, this is why I said it is kind of easy.
Until you know properly the leap years. Leap year rules on the long run are are bit funky. Just have a look at wikipedia.
(do not use gogol search since they are now forcing javascript by default)
You are perfectly wrong, the day is the main calendar object related to the epoch seconds.
I wrote conversion code, I know what I am talking about.
Crafted by Rajat
Source Code