OpenAI's latest o3 and o4-mini models perform exceptionally well in coding and math, but they tend to hallucinate more frequently